TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] searching for kanji, simple state machine

Date: Thu, 19 Jan 2006 14:50:52 +0900

From: David Riggs <dariggs@example.com>

Subject: [tlug] searching for kanji, simple state machine

User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050420 Debian/1.7.7-2
 > sjs@example.com
 > Would it be worth while to just write a simple c executable for this?
 > Regular expressions tend to be expensive in terms of CPU but I would 
think
 > a simple state machine (which really is just a highly optimized regular
 > expression engine) shouldn't be too difficult to write and should be
 > really fast if you have already stripped the noise (line numbering,
 > punctuation, whitespace, etc) out of your data set.  Assuming of course
 > you don't need the power of full regular expressions...
 >
 > If this approach was already mentioned, sorry -- I missed it.
 >
 > Steve S.
 >
 >

A simple state machine in c sounds about as simple as a trip to the moon 
to me. :) But anyway, I cannot permanently strip out all the noise from 
the data set, because I need it when I want to actually do something 
with the data: read it, quote it, etc. I just need to find where in the 
haystack my quote (ignoring punctuation etc) appears.

But anyway, I thought that whal perl bragged about was being as fast as 
such a machine, once the search was compiled (which is trivial in this 
case.)

But thanks for the comment. This whole project has been rather 
illuminating: I thought that this kind of searching past noise was an 
obvious problem that I was just dumb about. Maybe its not quite that simple.

David
Follow-Ups:

Re: [tlug] searching for kanji, simple state machine
From: Ian Wells

Prev by Date: Re: [tlug] about sanitize_e820_map()

Next by Date: Re: [tlug] Threaded email readers

Previous by thread: Re: [tlug] Kurobox Gentoo files

Next by thread: Re: [tlug] searching for kanji, simple state machine

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links