
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Search MySQL for Japanese Names
Dave M G writes:
> I'm still just a little confused over this "decomposed" part of the story.
Decomposed is a Unicode sort of thing, or half-wit katakana. Why do
it? Because you'll see it anyway; people will put dakuten or
handakuten on characters that don't have a composed form, and of
course halfwidth katakana require the decomposed form. For example, I
pronounce my name "Steven", in katakana スティーヴェン but in hiragana
(which for reasons I don't understand is occasionally demanded in
furigana) すてぃーう゛ぇん. Notice how the "ve" is decomposed in the
hiragana form.
> However, I'm not sure on how to get it.
I wouldn't worry about it. It's more a regularity thing. If you have
good Unicode support the system provides a routine to do it. Also if
you have Unicode support you can remove the (han)dakuten easily by
filtering anything that isn't the right kind of kana. That allows a
kind of fuzzy matching that is often useful (Japanese are sometimes
unclear about whether nigori is present or not in a given name).
> "decomposed" form. Is "decomposed" the term most often used for this
> kind of thing? Google isn't giving me much love when I search on it in
> relation to katakana.
Try "Unicode normal form NFD" if you're curious. (You probably want
wikipedia, not the Unicode Technical Report. :-)
Can't help you with PHP (never touch the stuff) or MySQL,
unfortunately.
Home |
Main Index |
Thread Index