Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] oneliners, Was: Moving on from xterm
- Date: Wed, 24 Aug 2016 08:14:33 +0900
- From: NOKUBI Takatsugu <knok@example.com>
- Subject: Re: [tlug] oneliners, Was: Moving on from xterm
- References: <20160819111442.GA30780@quadratic.cynic.net> <9f9cc5f579c92c3ddf7f29865d5862c2@jp.sometwo.net> <20160822114101.GA3944@fluxcoil.net> <87h9ace7zm.wl-knok@daionet.gr.jp> <CABHGxq4gBx39m0+TPZe3LLYPFetAvoc1wfZj0_0YGz3+w2A=1w@mail.gmail.com>
- User-agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (Gojō) APEL/10.8 EasyPG/1.0.0 Emacs/24.4 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
At Tue, 23 Aug 2016 13:24:15 +1000, Jim Breen wrote: > I see that Toshinori Sato, who has compiled the "neologd" > extensions (there's one for unidic too) has added a lot of > expanded terms which are not really morphemes. For example > if I put ラテン文字で表記される into it, the unidic and ipadic > segmentation is ラテン+文字+で+表記+さ+れる, but if I try > it with neologd I get ラテン文字+で+表記+さ+れる. In other > words he's added ラテン文字 as a unitary noun. If that's what > you want, fine, and his work may well help apps which just > want to add furigana to text, but it's getting right away from > being a morphological analyzer. Indeed. mecab-ipadic-neologd has many "long phrase" entries, it over morpheme. However, plain ipadic is also too old. For example, it has "通商産業省" entry, not have the former name "経済産業省" (it is also same on kakasidict). unidic don't have such entries, so it interprets like 経済+産業+省. It is a policy problem how to handle "morphome". On the other hand, Toshinori Sato said that mecab-ipadic-neologd is better performance than plain ipadic on text classification task. It's really hard problem... By the way, kakasi is a simple Kanji-Kana converter, so it can only kanji-sequence words, dictionary entry can't have non-kanji letter like hiragana, katakana. For example, "赤ら顔" is in ipadic and unidic, but it can't handle on kakasi (splitted to 赤+ら+顔).
- Follow-Ups:
- Re: [tlug] oneliners, Was: Moving on from xterm
- From: Jim Breen
- References:
- [tlug] Moving on from xterm
- From: Curt Sampson
- Re: [tlug] Moving on from xterm
- From: Furkan Mustafa
- [tlug] oneliners, Was: Moving on from xterm
- From: Christian Horn
- Re: [tlug] oneliners, Was: Moving on from xterm
- From: NOKUBI Takatsugu
- Re: [tlug] oneliners, Was: Moving on from xterm
- From: Jim Breen
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] oneliners, Was: Moving on from xterm
- Next by Date: Re: [tlug] strong correlation between lines of code and defects (was mojibake? emoji? (was: perl?))
- Previous by thread: Re: [tlug] oneliners, Was: Moving on from xterm
- Next by thread: Re: [tlug] oneliners, Was: Moving on from xterm
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links