Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] [OT] Regular Expressions to find Japanese Text



Botand, Stephen, Jim,

Thank you all for your responses and insights.

Jim said:
> (a) the occurrence of a 【 】 encapsulates a reading, and after that
> you are into the translation region.
> (b) once you reach a space followed by an ASCII character (usually alphabetic
> or a "("), you are into the translation region. If you didn't encounter
> a 【 】 pair along the way, the Japanese can be assumed to be kana-only.
>   
There seem to be other issues, such as where it starts out by saying
"possible inflected verb", and "partial match". Is it the case that
sometimes there might be some kind of English text before a Japanese word?

Or is the issue with my parser? In order to pull out definitions, I've
selected text that begins with <li> and ends with <br>, as this seems to
account for all words extracted from a WWWJDIC search.

> The exception to the above is Japanese names, where you get
> stuff like 
>  寿康 【としやす】 Toshiyasu (g) 【じゅこう】 Jukou (g) 【ひさやす】 Hisayasu (u) NA

Is it only Japanese names that have multiple readings? I would have
thought there would also be regular words with multiple readings,
especially with verbs with multiple inflections.

If it is the case that only names will have multiple readings, I may
ditch them for the time being, to give study priority to other words.

But if regular words have multiple readings and definitions, then I will
come up with a plan to account for them.

Here's a question that has relevance to the flash card program that I am
importing data into:

What word (or name) in the WWWJDIC server has the most readings and
definitions, and how many does it have?

Botond said:
> You should also consider the fact that there are edict dictionary files
> in other languages also, not just Japanese-English.
That is a good consideration for a more generally adopted application.
However, even though I'd share the source with anyone who might find it
useful, what I'm working on now is for my own purposes and so I can
guarantee that I'm only going to be using the Japanese-English dictionaries.

--
Dave M G



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links