Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
- Date: Thu, 26 Jan 2006 11:42:36 +0900
- From: Alain Hoang <hoanga@example.com>
- Subject: Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
- References: <43D0761B.40000@example.com><27EB8054-534B-4F05-88A1-53A3D3B0550E@example.com><878xt9hevz.fsf@example.com>
On Jan 21, 2006, at 10:47 PM, Stephen J. Turnbull wrote: >>>>>> "Alain" == Alain Hoang <hoanga@example.com> writes: > > Alain> These UTF-8 normalization forms and their interactions when > Alain> actually trying to deal with them are currently something > Alain> that looks like some black magic > > It's basically trivial. In German, you can write ss or you can write > ß, althouh the latter, composed, form is canonical. The > normalization forms simply dictate maximally composed and minimally > composed forms, with rules for handling cases where there are multiple > extrema. Conformant software is supposed to handle both forms. Thanks for the explanation. I guess it's not really that much black magic just ignorance on my own part. Hopefully, I'm less ignorant than I was before on this topic. :) > > Alain> The subtle differences of NFD and NFC manifested itself > Alain> when I was trying to write some text files using Vietnamese > Alain> in OS X then moved them over to a FreeBSD machine and > Alain> noticed the accent marks weren't attached. *sigh* > > By "not attached" do you mean "not displayed as composed"? The > necessary information to fix that is in the large Unidata table, which > tells you which characters are composed from others. If you mean > "lost", then you have seriously non-conforming software somewhere in > the pipeline. Yes, I meant not composed. The characters were definitely not lost just displaying in a not composed form. The software was not non-conforming. It was just the user that was non-aware of the issues. After trying to recall what I exactly did awhile back I finally retraced my steps. I was trying to write something in Vietnamese and display that in HTML. For some reason, I decided I wanted to use the HTML escape codes for this so in SubEthaEdit[1] I typed in something like this: tiếng việt Then, I used the Copy as XHTML function to get the HTML escaped sequences which gave me something like this: tiếngviệt When it displayed in Firefox, the rising accent mark (sorry don't know the name given to it in Unicode) over the ê was definitely not displaying over the ê. I have a small example at http://samsara.bebear.net/tv2.html What was confusing was that Safari showed the composed form while Firefox did not. Looking back at on all this, I can attribute this all to major user error on my part for not understanding one whit on normalization forms for UTF-8 back then. But I figure I should explain as much as I can remember in case anyone else ever runs into a similar issue (unlikely) and can avoid it spending a couple of hours confused like I did. Alain [1] An OS X Text Editor. I've found it handy for doing quick and dirty editing in UTF-8.
- Follow-Ups:
- Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
- From: Josh Glover
- Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
- From: Stephen J. Turnbull
- References:
- Re: [tlug] Kanji file names-- how to change encoding, Mac OS X/Darwinfile names
- From: David Riggs
- Re: [tlug] Kanji file names-- how to change encoding, Mac OS X/Darwin file names
- From: Alain Hoang
- Re: [tlug] Kanji file names-- how to change encoding, Mac OSX/Darwin file names
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] First impressions of Gentoo
- Next by Date: Re: [tlug] First impressions of Gentoo
- Previous by thread: Re: [tlug] Kanji file names-- how to change encoding, Mac OSX/Darwin file names
- Next by thread: Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links