Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] I hate encodings!
- Date: Tue, 29 Aug 2006 06:39:35 +0200
- From: Attila Kinali <attila@example.com>
- Subject: Re: [tlug] I hate encodings!
- References: <mailman.86.1156798909.9509.tlug@example.com> <009c01c6cb18$56336420$0f01a8c0@example.com>
On Tue, 29 Aug 2006 12:08:01 +0900 "Jeff Madsen" <jeff@example.com> wrote: > Hope that question made sense - you can probably detect my confusion > already! As far as i know there is no such documentation. But i can give you some hints on what you should do: 1) Use wherever possible utf-8. utf-8 is a super set of most (all?) other character sets. Thus you can represent any other encoding in utf-8. You should not consider to use anything else but utf-8 to store data, unless you have a special reason to do so. (It makes conversions and internationalitation [i18n] and multilinguqlization [m17n] very difficult) 2) Always use utf-8 internaly in your programs, no matter what character set your data uses. Even if you have to use a non-utf-8 encoding for your data outside your program, it still makes sense to use utf-8 within your program. This will allow an easy switch to another encoding, or make it possible to add another encoding later. 3) Use iconv and similar libraries to convert between character sets. Using a library that is publicly available to handle character set conversion minimizes your work and gives you an already tested and known to work subsystem. 4) Be aware that upper case <-> lower case conversions depend on the language used. There are languages out there that use different characters for upper case version of characters than most other languages. One example is Turkish, an uppercase "i" is not as one would expect an "I" but "İ" (a lower case "I" would be "ı"). I know at least of one program where this caused a segfault. You should of course have a look at the documentation of the libraries and programs involved. Also reading the locale(5) manpage will give you some hints on how languages and everything around them is handled. Attila Kinali -- 心をこめて聞け心をこめて話せ
- References:
- [tlug] I hate encodings!
- From: Jeff Madsen
Home | Main Index | Thread Index
- Prev by Date: [tlug] I hate encodings!
- Next by Date: Re: [tlug] What, no Perl programmers around here? ;-)
- Previous by thread: [tlug] I hate encodings!
- Next by thread: [tlug] Mirror for MPlayer in Japan needed
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links