Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Optimizing Search for kanji strings



David Riggs wrote:

[> Jim wrote:]
>  >David, of the hundreds of megabytes of text, how big is each file?
>  >What is the longest line in any of those files?
>  >What is the largest file?

> To answer the data questions: each line is 20 to 80 characters 

That is nice and very reasonable in any character coding. 

> The 326MB is in 2460 files in 56 folders, 

Learn how to master the find command to deal with the 56 folders, 
although if you can get away with filename globbing (as you seem to), 
the just stick with globbing. 

> 2.5MB max file size, 

Great! That means that each file can be sucked into memory 
for easier searching. 

> My current perl script does what I had originally hoped for. 

Good. You have the "First make it work _right_" part done. 

> It is really pretty fast, for what it does, 

Good! 
How fast is that? 
How much faster to you want it to run? 



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links