
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] multibyte tr (or i18n coreutils)
- Date: Sun, 18 Apr 2010 19:51:23 +0200
- From: Michal Hajek <hajek1@example.com>
- Subject: [tlug] multibyte tr (or i18n coreutils)
- User-agent: Mutt/1.5.18 (2008-05-17)
Hello,
I need something like:
tr \
'[a,b,c,č,d,ď,e,ě,f,g,h,i,í,j,k,l,m,n,ň,o,ó,p,q,r,ř,s,š,t,ť,u,ú,ů,v,w,x,y,ý,z,ž]'\
'[2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9]'
but tr[1] does not seem to understand multibyte characters.
For example:
LC_ALL=cs_CZ.UTF-8 echo "ššš" |tr \
'[a,b,c,č,d,ď,e,ě,f,g,h,i,í,j,k,l,m,n,ň,o,ó,p,q,r,ř,s,š,t,ť,u,ú,ů,v,w,x,y,ý,z,ž]'\
'[2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9]'
gives:
]8]8]8
Is there another simple way of doing the above substitution?
Or is there a way to persuade "tr" to work with utf8 ?
Thanks in advance
Michal
[1]
$ tr --version
tr (GNU coreutils) 8.4
Packaged by Gentoo (8.4 (p1))
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jim Meyering.
Home |
Main Index |
Thread Index