TLUG Mailing List

Mailing List Archive
Support open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Counting hiragana in EUC

To: tlug@example.com

Subject: Re: Counting hiragana in EUC

From: Simon Cozens <simon@example.com>

Date: Mon, 5 Feb 2001 03:23:50 +0000

Content-Disposition: inline

Content-Type: text/plain; charset=us-ascii

In-Reply-To: <200102050049.JAA22616@example.com>; from jwb@example.com on Mon, Feb 05, 2001 at 09:49:09AM +0900

References: <200102050049.JAA22616@example.com>

Reply-To: tlug@example.com

Resent-From: tlug@example.com

Resent-Message-ID: <nIgPJD.A.PYG.Pzhf6@example.com>

Resent-Sender: tlug-request@example.com

Sender: Simon Cozens <simon@example.com>

User-Agent: Mutt/1.3.12i
On Mon, Feb 05, 2001 at 09:49:09AM +0900, Jim Breen wrote:
> In the text-glossing function in my dictionary server, I take a
> quick-and-dirty approach of (a) ignoring hiragana entirely, on the
> grounds that (i) the user should know the particles, stock words &
> phrases, etc already, and (ii) it's all too hard,

Fine for you. :) What I'm doing is trying to advance the state of the art in
input method environments. When you're dealing with IMEs, you effectively have
an input stream of unsegmented hiragana, and your aim is to produce
kanamajirabun. So, instead of using a simple dictionary lookup, I reckon you
could get a lot better accuracy by segmenting the input into kanji compounds
and non-kanji, and then using a selection algorithm to get the
appropriate kanji. My segmentation algorithm is working nicely on English
input, so I'm kinda giddy right now, but I haven't exposed it to Japanese text
just yet. That's tomorrow's job.

-- 
<Twofish> Pokemon seems an evil concept. Kid hunts animals, and takes
them from the wild into captivity, where he trains them to fight, and
then fights them to the death against other people's pokemon. Doesn't
this remind you of say, cock fighting?
Follow-Ups:

Re: Counting hiragana in EUC
From: "Stephen J. Turnbull" <turnbull@example.com>

References:

Re: Counting hiragana in EUC
From: jwb@example.com (Jim Breen)

Prev by Date: Re: Counting hiragana in EUC

Next by Date: Re: Counting hiragana in EUC

Prev by thread: Re: Counting hiragana in EUC

Next by thread: Re: Counting hiragana in EUC

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links