Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: Counting hiragana in EUC
- To: tlug@example.com
- Subject: Re: Counting hiragana in EUC
- From: Simon Cozens <simon@example.com>
- Date: Mon, 5 Feb 2001 03:23:50 +0000
- Content-Disposition: inline
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <200102050049.JAA22616@example.com>; from jwb@example.com on Mon, Feb 05, 2001 at 09:49:09AM +0900
- References: <200102050049.JAA22616@example.com>
- Reply-To: tlug@example.com
- Resent-From: tlug@example.com
- Resent-Message-ID: <nIgPJD.A.PYG.Pzhf6@example.com>
- Resent-Sender: tlug-request@example.com
- Sender: Simon Cozens <simon@example.com>
- User-Agent: Mutt/1.3.12i
On Mon, Feb 05, 2001 at 09:49:09AM +0900, Jim Breen wrote: > In the text-glossing function in my dictionary server, I take a > quick-and-dirty approach of (a) ignoring hiragana entirely, on the > grounds that (i) the user should know the particles, stock words & > phrases, etc already, and (ii) it's all too hard, Fine for you. :) What I'm doing is trying to advance the state of the art in input method environments. When you're dealing with IMEs, you effectively have an input stream of unsegmented hiragana, and your aim is to produce kanamajirabun. So, instead of using a simple dictionary lookup, I reckon you could get a lot better accuracy by segmenting the input into kanji compounds and non-kanji, and then using a selection algorithm to get the appropriate kanji. My segmentation algorithm is working nicely on English input, so I'm kinda giddy right now, but I haven't exposed it to Japanese text just yet. That's tomorrow's job. -- <Twofish> Pokemon seems an evil concept. Kid hunts animals, and takes them from the wild into captivity, where he trains them to fight, and then fights them to the death against other people's pokemon. Doesn't this remind you of say, cock fighting?
- Follow-Ups:
- Re: Counting hiragana in EUC
- From: "Stephen J. Turnbull" <turnbull@example.com>
- References:
- Re: Counting hiragana in EUC
- From: jwb@example.com (Jim Breen)
Home | Main Index | Thread Index
- Prev by Date: Re: Counting hiragana in EUC
- Next by Date: Re: Counting hiragana in EUC
- Prev by thread: Re: Counting hiragana in EUC
- Next by thread: Re: Counting hiragana in EUC
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links