Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Search MySQL for Japanese Names]



Edward Middleton <emiddleton@example.com> wrote:
> I've been tasked with building a MySQL database that will store the
> names of people in Japanese kanji. Those names need to be searched
> alphabetically from a web interface.
>
> By "alphabetically", I mean in order of the hiragana... あいうえお、かき
> くけこ..., and so on.
>
> Since kanji can have multiple readings, will I need to store a separate,
> katakana version of the name in order to search for it "alphabetically"?

Or hiragana. Kana in any place.

> Will that katakana need to be half width, or does it matter?

Why half-width? I avoid it like the plague. It only exists for
legacy JIS X 0201 reasons, and the sooner it's completely in the bin, the
better.

> Or can I get an ascending order of names with just the kanji?

Not really.

> Or should I say screw it and get the users to put in a romaji version of
> their name?

There are several sources of possible readings of names-in-kanji. Don't
rely on things like MeCab or Chasen and their lexicons are rather limited
for names. ENAMDICT has a huge name collection and you can get the
possibilities by looking them up on
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?2C

HOWEVER you really must get confirmation on how people read their names.
A significant number of names are read in unusual ways.

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, VCA Secondary School, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links