
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Search MySQL for Japanese Names]
- Date: Tue, 20 Oct 2009 10:49:30 +1000
- From: Jim Breen <jimbreen@example.com>
- Subject: Re: [tlug] Search MySQL for Japanese Names]
Edward Middleton <emiddleton@example.com> wrote:
> I've been tasked with building a MySQL database that will store the
> names of people in Japanese kanji. Those names need to be searched
> alphabetically from a web interface.
>
> By "alphabetically", I mean in order of the hiragana... あいうえお、かき
> くけこ..., and so on.
>
> Since kanji can have multiple readings, will I need to store a separate,
> katakana version of the name in order to search for it "alphabetically"?
Or hiragana. Kana in any place.
> Will that katakana need to be half width, or does it matter?
Why half-width? I avoid it like the plague. It only exists for
legacy JIS X 0201 reasons, and the sooner it's completely in the bin, the
better.
> Or can I get an ascending order of names with just the kanji?
Not really.
> Or should I say screw it and get the users to put in a romaji version of
> their name?
There are several sources of possible readings of names-in-kanji. Don't
rely on things like MeCab or Chasen and their lexicons are rather limited
for names. ENAMDICT has a huge name collection and you can get the
possibilities by looking them up on
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?2C
HOWEVER you really must get confirmation on how people read their names.
A significant number of names are read in unusual ways.
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, VCA Secondary School, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne
Home |
Main Index |
Thread Index