Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] searching for kanji, simple state machine



 > sjs@example.com
 > Would it be worth while to just write a simple c executable for this?
 > Regular expressions tend to be expensive in terms of CPU but I would 
think
 > a simple state machine (which really is just a highly optimized regular
 > expression engine) shouldn't be too difficult to write and should be
 > really fast if you have already stripped the noise (line numbering,
 > punctuation, whitespace, etc) out of your data set.  Assuming of course
 > you don't need the power of full regular expressions...
 >
 > If this approach was already mentioned, sorry -- I missed it.
 >
 > Steve S.
 >
 >

A simple state machine in c sounds about as simple as a trip to the moon 
to me. :) But anyway, I cannot permanently strip out all the noise from 
the data set, because I need it when I want to actually do something 
with the data: read it, quote it, etc. I just need to find where in the 
haystack my quote (ignoring punctuation etc) appears.

But anyway, I thought that whal perl bragged about was being as fast as 
such a machine, once the search was compiled (which is trivial in this 
case.)

But thanks for the comment. This whole project has been rather 
illuminating: I thought that this kind of searching past noise was an 
obvious problem that I was just dumb about. Maybe its not quite that simple.

David



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links