Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] searching for kanji, simple state machine
- Date: Thu, 19 Jan 2006 14:50:52 +0900
- From: David Riggs <dariggs@example.com>
- Subject: [tlug] searching for kanji, simple state machine
- User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050420 Debian/1.7.7-2
> sjs@example.com > Would it be worth while to just write a simple c executable for this? > Regular expressions tend to be expensive in terms of CPU but I would think > a simple state machine (which really is just a highly optimized regular > expression engine) shouldn't be too difficult to write and should be > really fast if you have already stripped the noise (line numbering, > punctuation, whitespace, etc) out of your data set. Assuming of course > you don't need the power of full regular expressions... > > If this approach was already mentioned, sorry -- I missed it. > > Steve S. > > A simple state machine in c sounds about as simple as a trip to the moon to me. :) But anyway, I cannot permanently strip out all the noise from the data set, because I need it when I want to actually do something with the data: read it, quote it, etc. I just need to find where in the haystack my quote (ignoring punctuation etc) appears. But anyway, I thought that whal perl bragged about was being as fast as such a machine, once the search was compiled (which is trivial in this case.) But thanks for the comment. This whole project has been rather illuminating: I thought that this kind of searching past noise was an obvious problem that I was just dumb about. Maybe its not quite that simple. David
- Follow-Ups:
- Re: [tlug] searching for kanji, simple state machine
- From: Ian Wells
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] about sanitize_e820_map()
- Next by Date: Re: [tlug] Threaded email readers
- Previous by thread: Re: [tlug] Kurobox Gentoo files
- Next by thread: Re: [tlug] searching for kanji, simple state machine
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links