It's an interesting problem, Michal.  So the Linux boxes I have at hand (CentOS 5.3 and Debian 4.0) only show version 5.9.7:

$ tr --version
tr (GNU coreutils) 5.97
Clearly it's not handling the multibyte chars:
$ LC_ALL=cs_CZ.UTF-8 echo 'ššš' | tr 'š' '123'
$ echo '僕はサチだ' | tr 'サチ' '純苔'
$ echo '僕はサチだ' | sed s/サチ/純苔/

I'm a bit of a Linux novice, but sed should work as well, yes?  Much more laborious than using tr, but depending on what you're doing, the POSIX equivalence classes might help:
$ echo "sššsqwerty" | sed s/[[=s=]]/1/g

Or with a bit more typing than your tr command:
$ export REPLACEMENTS="s/[abcč]/2/g s/[dďeěf]/3/g s/[ghií]/4/g s/[jkl]/5/g s/[mnňoó]/6/g s/[pqrřsš]/7/g s/[tťuúůvwxyýzž]/8/g"
$ export STRING='ššššabe'
$ echo $STRING
$ for arg in $REPLACEMENTS; do export STRING=`echo $STRING | sed $arg`; done
$ echo $STRING

Interestingly, on OSX it works fine:
$ echo "ššš" | tr 'š' '12'
$ LC_ALL=cs_CZ.UTF-8 echo "ššš" | tr '[a,b,c,č,d,ď,e,ě,f,g,h,i,í,j,k,l,m,n,ň,o,ó,p,q,r,ř,s,š,t,ť,u,ú,ů,v,w,x,y,ý,z,ž]' '[2,2,2,2,3,3,3,3,3,4,4,4,4,5,5,5,6,6,6,6,6,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9]'
$ echo '僕はサチだ' | tr 'サチ' '純苔'

(There's no --version option in the OSX one)

Incidentally, the man page on my Linux box makes no mention of supporting LC_ALL, but the OSX one does.  Also, I'm not sure which version this refers to, but I did find this:
"Currently tr fully supports only single-byte characters. Eventually it will support multibyte characters; when it does, the -C option will cause it to complement the set of characters, whereas -c will cause it to complement the set of values. This distinction will matter only when some values are not characters, and this is possible only in locales using multibyte encodings when the input contains encoding errors."

So I guess maybe the answer is that the Gnu version doesn't, but the BSD (and I'm guessing Solaris) version does?


2010/4/18 Michal Hajek <>

I need something like:
tr \

but tr[1] does not seem to understand multibyte characters.
For example:
LC_ALL=cs_CZ.UTF-8 echo "ššš" |tr \


Is there another simple way of doing the above substitution?

Or is there a way to persuade "tr" to work with utf8 ?

Thanks in advance


$ tr --version
tr (GNU coreutils) 8.4
Packaged by Gentoo (8.4 (p1))
