
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] [OT/long] Yet another JMdict front-end
- Date: Tue, 01 Aug 2006 12:43:33 +1000 (EST)
- From: Jim Breen <Jim.Breen@example.com>
- Subject: Re: [tlug] [OT/long] Yet another JMdict front-end
Matt Gushee <matt@example.com> wrote:
>> Now on to more substantive issues:
>>
>> Indexing approach
>> -----------------
>>
>> There will probably be several indexes in the future, but currently I
>> provide one way to look up Kanji: a traditional radical/stroke-count
>> index. Specifically, you select the radical stroke count, then the
>> radical itself, then the stroke count for the whole character, then the
>> specific character that you want. Although it is a linear process and
>> thus easy to understand in principle, it has the disadvantage that
>> people don't know by heart how many strokes are in a character, and it
>> can be very hard to figure out for the more complex ones. In a printed
>> dictionary it's less of a problem because you can easily shift your eyes
>> to another part of the page; in a browser I think it will be awkward at
>> best.
>>
>> What other alternatives might work well (when you don't know the
>> pronunciation)? I've seen Jim Breen's "multi-radical" method and was
>> initially resistant to it for a couple of reasons: first, it is
>> non-linear, and thus is superficially more complex than the
>> radicals/strokes method.
But MUCH more popular with the great unwashed. Some time ago I
extracted measurements from WWWJDIC on kanji lookups. The multi-
radical method won. See:
http://www.csse.monash.edu.au/~jwb/kanjindx.html for a paper
about kanji indexing.
>> Second, I have been taught (for both Chinese and Japanese) that the
>> radical is the "meaning" component, and that in general a character has
>> exactly one radical. At any rate, I believe the radical has etymological
>> significance, and that understanding which part of the Kanji is the
>> radical can contribute to an overall mastery of the language. And a
>> single-radical dictionary index reinforces that understanding.
Only partly true. For "semasio-phonetic" kanji it may provide
at least the semantic domain, but the linkage can be vague at
times.
>> But I'm thinking that a multi--can I say "component" instead of
>> "radical"? Then maybe I could set aside the philosophical objection.
>> Anyway, a well-designed multi-thing index might after all be an easier
>> way to look up Kanji.
It sure is. For WWWJDIC I hope one day to do a Java-based
version rather than the current vanilla HTML form approach.
>> Strokes/radicals index navigation
>> ---------------------------------
>>
>> If I decide to go to a multi-component index, this might not matter any
>> more. But for the moment, there is an issue with the index menus: in
>> view of the fact that the user will often not be sure how many strokes
>> there are in a character, I have created dynamic menus such that ...
>> actually it's best if you try it out. Basically, if you move your mouse
>> over an item in one row of the menu, the next row is *temporarily*
>> displayed. Thus, let's say you have chosen a given radical. There is a
>> row of numbers representing stroke counts of characters with that
>> radical; if you run your mouse along that row you can easily see what
>> characters exist for each stroke count.
>>
>> So, do you think this is (a) useful, and (b) intuitive? It would be a
>> lot easier to make the menus so that the next row only changes when you
>> click something. But if people find the transient display a very helpful
>> feature, I will make it work.
Seems quite good so far.
>> Presentation of results
>> -----------------------
>>
>> Currently when you select a Kanji, a request goes to the server, which
>> returns a document containing all phrases that start with that Kanji.
>> This document is dumped into a table with 3 columns: [Kanji] Phrase,
>> Reading, and Definitions. This is reasonable in some cases, but
>> sometimes the response document is quite large, so I think some kind of
>> chunking and/or filtering would be helpful. It gets worse if we want to
>> look up all phrases *containing* the selected character. My server-side
>> script can indeed do that, but sometimes it's just way too much data, so
>> I've disabled that behavior for the moment.
Comments.
- you leave out the part-of-speech, etc. Not a good idea.
- you use a comma between glosses - better to use ";" as commas
occur withing glosses and it can get ambiguous.
>> Another issue with the result sets is that they're not sorted in any
>> useful way--actually I believe they are ordered according to the JMdict
>> entry sequence number.
Yes, which is a mixture of headword order (on the day it was
first built, and then historical. Not a good display order.
>> So, how can I improve the processing and presentation of the results?
JMdict has various frequency of use tags, which may be useful for
ordering.
I find the spaced-out table a bit clunky.
>> Miscellaneous technical stuff
>> -----------------------------
>>
>> Preparing the index: my list of radicals is derived from Jim Breen's
>> KANJIDIC, but since his data is prepared for a multi-radical lookup
>> system, I can't automatically extract a radicals-and-strokes index, so I
>> am currently creating the index manually.
Tsk, tsk. WWWJDIC has a page of classical radicals. See:
http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwraddisp.cgi The file
that built that table is used by xjdic too and is inthe xjdic
tarball.
>> That's why it's so incomplete,
>> of course. Does anyone know of another database somewhere that list each
>> kanji by (single) radical and stroke count?
Why do you need another? 8-)}
Seriously, there are a few others around, but they are (almost)
all derived from KANJIDIC.
>> Glyphs for radicals: if my understanding of the KANJIDIC documentation
>> is correct, there is a glyph of each radical in Japanese Kanji, but some
>> of them only exist in JISX-0212.
Not even in that case. JIS212 added some, but the rest really came
later.
>>If so, you either have to require the
>> user to have a JISX-0212 font, use images to represent some radicals, or
>> use substitute glyphs from JISX-0208. The last option is not really
>> acceptable, I don't think. E.g., 化 for 篋阪??
As you prolly know, Unicode replicated all te classical radicals
in a blockof their own.
HTH
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大蛙触
Home |
Main Index |
Thread Index