Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] how to tune reiser4 for millions of files?



Michal Hajek writes:

 > "Fast handling of very large directories with hundreds of millions of
 > files (yes, millions of files in a single directory without affecting
 > performance)." 

What that means is that you can find one file very quickly, even if
there are millions of files.  It does not mean that you can process
millions of files quickly.  The problem you have here is that you're
doing millions of operations on files, and every one means a system
call, which means slow.

 > All maybe a different filesystem altogether?  Or maybe using lvm with
 > the idea to "spread" the fs onto more hw, or maybe use ssd disk? Or...I
 > do not know. So maybe somebody comes with an interesting idea :]

No, I don't think any of those will help.  You need to rewrite the app.

One thing: ls is possibly the slowest possible application imaginable,
because all it does is make syscalls.  I would expect that the problem
is not the syscall overhead (although I bet that's non-negligible),
it's that the syscall yields the processor to another process.  I
would imagine that every syscall means you block for a couple ms.
Suppose it's 5ms.  Then 5ms X 7,000,000 = 35,000 seconds, or 10
hours.  And that's just the overhead for the stats to get file
information; ls also writes to the screen which is going to mean more
syscalls and more delays.

So it's quite possible that your analysis program will be *faster*
than ls because it does fewer syscalls.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links