TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names

Date: Thu, 26 Jan 2006 11:42:36 +0900

From: Alain Hoang <hoanga@example.com>

Subject: Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names

References: <43D0761B.40000@example.com><27EB8054-534B-4F05-88A1-53A3D3B0550E@example.com><878xt9hevz.fsf@example.com>
On Jan 21, 2006, at 10:47 PM, Stephen J. Turnbull wrote:

>>>>>> "Alain" == Alain Hoang <hoanga@example.com> writes:
>
>     Alain> These UTF-8 normalization forms and their interactions when
>     Alain> actually trying to deal with them are currently something
>     Alain> that looks like some black magic
>
> It's basically trivial.  In German, you can write ss or you can write
> ß, althouh the latter, composed, form is canonical.  The
> normalization forms simply dictate maximally composed and minimally
> composed forms, with rules for handling cases where there are multiple
> extrema.  Conformant software is supposed to handle both forms.

	Thanks for the explanation.  I guess it's not really that much
black magic just ignorance on my own part.  Hopefully, I'm
less ignorant than I was before on this topic.  :)

>
>     Alain> The subtle differences of NFD and NFC manifested itself
>     Alain> when I was trying to write some text files using Vietnamese
>     Alain> in OS X then moved them over to a FreeBSD machine and
>     Alain> noticed the accent marks weren't attached.  *sigh*
>
> By "not attached" do you mean "not displayed as composed"?  The
> necessary information to fix that is in the large Unidata table, which
> tells you which characters are composed from others.  If you mean
> "lost", then you have seriously non-conforming software somewhere in
> the pipeline.

		Yes, I meant not composed.   The characters were definitely
not lost just displaying in a not composed form.  The software was
not non-conforming.  It was just the user that was non-aware of the  
issues.

After trying to recall what I exactly did awhile back I finally  
retraced my steps.
I was trying to write something in Vietnamese and display that
in HTML.  For some reason, I decided I wanted to use the HTML
escape codes for this so in SubEthaEdit[1] I typed in something like
this:

tiếng việt

Then, I used the Copy as XHTML function to get the HTML
escaped sequences which gave me something like
this:

tie&#770;&#769;ngvie&#803;&#770;t

	When it displayed in Firefox, the rising accent mark (sorry
don't know the name given to it in Unicode) over the ê was
definitely not displaying over the ê.  I have a small example
at http://samsara.bebear.net/tv2.html

	What was confusing was that Safari showed the composed
form while Firefox did not.

	Looking back at on all this, I can attribute this all to major
user error on my part for not understanding one whit on
normalization forms for UTF-8 back then.   But I figure I should
explain as much as I can remember in case anyone else ever
runs into a similar issue (unlikely) and can avoid it spending a
couple of hours confused like I did.


Alain

[1] An OS X Text Editor.  I've found it handy for doing quick
and dirty editing in UTF-8.
Follow-Ups:

Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
From: Josh Glover

Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names
From: Stephen J. Turnbull

References:

Re: [tlug] Kanji file names-- how to change encoding, Mac OS X/Darwinfile names
From: David Riggs

Re: [tlug] Kanji file names-- how to change encoding, Mac OS X/Darwin file names
From: Alain Hoang

Re: [tlug] Kanji file names-- how to change encoding, Mac OSX/Darwin file names
From: Stephen J. Turnbull

Prev by Date: Re: [tlug] First impressions of Gentoo

Next by Date: Re: [tlug] First impressions of Gentoo

Previous by thread: Re: [tlug] Kanji file names-- how to change encoding, Mac OSX/Darwin file names

Next by thread: Re: [tlug] Kanji file names-- how to change encoding,Mac OS X/Darwin file names

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links