Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] a japanese dictionary: regex v. db query



"Stephen J. Turnbull" wrote:

> What is the regular expression for "all characters with 16 strokes"?

Oh boy. That made me think. 
That was the right question to highlight the limitations of regexes. 

I can not think of how to express "all characters with 16 strokes" 
in the present schemes of regexes as I know them. Of course one 
could extend regexes to also match _attributes_ of characters, such as 
brush stroke count. As if the syntax of regexes wasn't "simple" 
enough already, I shudder at the thought of what the extended 
regex syntax would be. 

The complement, regexes in database queries already exists. 

So one has classic tradeoffs. One can extend regexes to 
(perhaps poorly) do what databases do well. Or one can use 
databases that can handle regexes in their queries. 
To looking for all the 16 stroke characters in a document, 
I would want a regex. For a dictionary, a database would 
probably be fine. 

Which brings us back to Stibbe's interest: a Japanese dictionary. 
Extending regexes to handle attributes would likely be a 
significant project in its own right, so Stibbe might want 
to stick to the tools presently available. 

I.e., a database. 



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links