Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] a japanese dictionary: regex v. db query



On Tue, 4 Apr 2006, Jim wrote:
>
> "Stephen J. Turnbull" wrote:
> 
> > What is the regular expression for "all characters with 16 strokes"?
> 
> Oh boy. That made me think. 
> That was the right question to highlight the limitations of regexes. 
> 
> I can not think of how to express "all characters with 16 strokes" 
> in the present schemes of regexes as I know them.

But 'man re_syntax' reveals extensions such as:

  [:digit:]   which matches any digit, or
  [:punct:]   which matches any punctuation character

Why not extend that syntax to include things like:

  [:stroke=16:]

to match any character with 16 strokes. Or even:

  [:rad=<code>:]

to match any character containing the radical at codepoint 'code'. You 
could probably convert this internally to an SQL search including just 
about any character property you have stored in the database.

I would consider this more useful for a word search than for single kanji 
searches. RE's become useful when there are potentially many characters in 
the search target... or for someone stuck with a text-only interface to 
the database ;-).

--
Joe Larabell -- Synopsys VCS Support      US: larabell@example.com
http://wwwin.synopsys.com/~larabell/   Japan: larabell@???


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links