Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] multibyte tr (or i18n coreutils)



Hello, 

I need something like:
tr \
'[a,b,c,č,d,ď,e,ě,f,g,h,i,í,j,k,l,m,n,ň,o,ó,p,q,r,ř,s,š,t,ť,u,ú,ů,v,w,x,y,ý,z,ž]'\
'[2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9]'

but tr[1] does not seem to understand multibyte characters. 
For example:
LC_ALL=cs_CZ.UTF-8 echo "ššš" |tr \
'[a,b,c,č,d,ď,e,ě,f,g,h,i,í,j,k,l,m,n,ň,o,ó,p,q,r,ř,s,š,t,ť,u,ú,ů,v,w,x,y,ý,z,ž]'\
'[2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9]' 

gives:
]8]8]8

Is there another simple way of doing the above substitution?

Or is there a way to persuade "tr" to work with utf8 ? 

Thanks in advance

Michal

[1]
$ tr --version
tr (GNU coreutils) 8.4
Packaged by Gentoo (8.4 (p1))
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jim Meyering.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links