
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] searching for kanji, simple state machine
- Date: Thu, 19 Jan 2006 14:50:52 +0900
- From: David Riggs <dariggs@example.com>
- Subject: [tlug] searching for kanji, simple state machine
- User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050420 Debian/1.7.7-2
> sjs@example.com
> Would it be worth while to just write a simple c executable for this?
> Regular expressions tend to be expensive in terms of CPU but I would
think
> a simple state machine (which really is just a highly optimized regular
> expression engine) shouldn't be too difficult to write and should be
> really fast if you have already stripped the noise (line numbering,
> punctuation, whitespace, etc) out of your data set. Assuming of course
> you don't need the power of full regular expressions...
>
> If this approach was already mentioned, sorry -- I missed it.
>
> Steve S.
>
>
A simple state machine in c sounds about as simple as a trip to the moon
to me. :) But anyway, I cannot permanently strip out all the noise from
the data set, because I need it when I want to actually do something
with the data: read it, quote it, etc. I just need to find where in the
haystack my quote (ignoring punctuation etc) appears.
But anyway, I thought that whal perl bragged about was being as fast as
such a machine, once the search was compiled (which is trivial in this
case.)
But thanks for the comment. This whole project has been rather
illuminating: I thought that this kind of searching past noise was an
obvious problem that I was just dumb about. Maybe its not quite that simple.
David
Home |
Main Index |
Thread Index