Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Search MySQL for Japanese Names]

> There are several sources of possible readings of names-in-kanji. Don't
> rely on things like MeCab or Chasen and their lexicons are rather limited
> for names. ENAMDICT has a huge name collection and you can get the
> possibilities by looking them up on
> HOWEVER you really must get confirmation on how people read their names.
> A significant number of names are read in unusual ways.
> Jim

Absolutely right. Mecab/Chasen dictionaries (IPADIC, Unidic, whichever
one you plug into them) don't include anywhere the amount of name
readings as ENAMDICT. By design these parsers don't want multiple
readings for names. They just want the most likely one.

I've made account-registration webpage forms which, being AJAX-y, do a
lot of things dynamically as the user types. E.g. when they type the
yomi (a.k.a. furigana a.k.a. readings) fior their names I create the
romaji version simultaneously. Or when they type the postcode, the
address field is filled out to the town/suburb level. But I don't fill
in the furigana when they type the kanji version of the name, for fear
of pissing off those whose name readings are not the most common.

Jim, curious question: how many names in ENAMDICT resolve to just one
reading? Even a I-would-have-thought-surefire candidate for uniqueness
such as 田中(tanaka) resolves to ten different readings in ENAMDICT
(tanata, tanka, danaka, nunoka, ....). 鈴木(suzuki) has seven.


Akira Kurogane

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links