Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Counting hiragana in EUC



Simon Cozens <simon@example.com> writes:
 
> Yep. This is just for the purposes of an experiment, to see whether or
> not I can segment incoming hiragana text into words.

My favourite topic!!!...And I am always happy to get a chance to talk
about it, or to learn from other minds.
While your task seems to be interessting, I just wonder what your
definition of the word "word" may be.

Segementing a Japanese phrase into words (read "word" as lemmata) is for sure
very important for e.g an automatic dictionary-lookup routine.
For segmenting incoming hiragana text in a meaningful way, Part-of-speech/
morphological segmentation or bunsetsu  segmentation seems IMHO to be a
more promising approach, but I am allways happy to get new creative input.

Maybe 
http://www.ipsj.or.jp/members//Journal/Eng/3806/article005.html
could be interessting for you.

HTH,

Andreas Marcel Riechert


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links