
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] I hate encodings!
On Tue, 29 Aug 2006 12:08:01 +0900
"Jeff Madsen" <jeff@example.com> wrote:
> Hope that question made sense - you can probably detect my confusion
> already!
As far as i know there is no such documentation. But i can give
you some hints on what you should do:
1) Use wherever possible utf-8.
utf-8 is a super set of most (all?) other character sets.
Thus you can represent any other encoding in utf-8.
You should not consider to use anything else but utf-8
to store data, unless you have a special reason to do so.
(It makes conversions and internationalitation [i18n]
and multilinguqlization [m17n] very difficult)
2) Always use utf-8 internaly in your programs, no matter
what character set your data uses.
Even if you have to use a non-utf-8 encoding for your
data outside your program, it still makes sense to use
utf-8 within your program. This will allow an easy switch
to another encoding, or make it possible to add another encoding
later.
3) Use iconv and similar libraries to convert between character sets.
Using a library that is publicly available to handle character set
conversion minimizes your work and gives you an already tested
and known to work subsystem.
4) Be aware that upper case <-> lower case conversions depend
on the language used.
There are languages out there that use different characters
for upper case version of characters than most other languages.
One example is Turkish, an uppercase "i" is not as one would
expect an "I" but "İ" (a lower case "I" would be "ı").
I know at least of one program where this caused a segfault.
You should of course have a look at the documentation of the libraries
and programs involved. Also reading the locale(5) manpage will give you
some hints on how languages and everything around them is handled.
Attila Kinali
--
心をこめて聞け心をこめて話せ
Home |
Main Index |
Thread Index