Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] how to tune reiser4 for millions of files?





* Curt Sampson (cjs@example.com) [100131 07:43]:

>    ls -1 -U    # no sort, no inode lookup
>    ls -1       # sorted,  no inode lookup
>    ls -1 -l -U # no sort, inode lookups
>    ls -1 -l    # sorted,  inode lookups

thank you for a lot of explanations and help. 
Here I come with some measurements. 

The size of date is not precisely known, since:
 # time du -sk /mnt/polea/out/data-0000001255522556768705-dat  
 ^C

 real    115m44.009s
 user    0m23.448s
 sys     1m54.327s

But the used part of filesystem is 113GB big, which means data have
something like 80GB. File sizes are about 10-12KB. So once again, Curt
made quite a good guess. I may try to run "du -sk" overnight.  

# time ls -1 -U /mnt/polea/out/data-0000001255522556768705-dat/ | wc -l
7032035  (produces output immediately)

real    11m12.190s
user    0m6.695s
sys     0m47.079s

# time  ls -1  /mnt/polea/out/data-0000001255522556768705-dat/ | wc -l
ls: memory exhausted
0

real    12m7.636s
user    0m1.929s
sys     0m47.453s

kfk-64 ~ # time  ls -1 -l -U /mnt/polea/out/data-0000001255522556768705-dat/ | wc -l
^C  (I killed it, not being patient enough and anyway - it demonstrates
the difference)

real    249m38.253s
user    0m20.212s
sys     5m53.706s


#  time sh -c 'find /mnt/polea/out/data-0000001255522556768705-dat/ -type f -print0 | xargs -0 -r -- ls -l > /dev/null'

real    542m21.087s
user    1m47.939s
sys     10m41.090s

Here I let it run overnight :) I did not find a better way to measure
the time. Also, I do not know why it took so much more time. The only
difference is "-type f" which I added. Hopefuly >/dev/null makes no
difference. I added it because the terminal made the output somewhat
slow in the previous cases. Suggestions are wellcome.

Now the whole thing is only a matter of academical discussion or
personal interest. The analysis application is doomed for sure.  But I
do not mind playing with the system for a while out of curiosity.

One side note, in dmesg I have:

[ 1778.148042] ls used greatest stack depth: 5084 bytes left
[73594.665219] ls used greatest stack depth: 5064 bytes left
[85121.212329] tee used greatest stack depth: 4912 bytes left
[85121.212415] sdcfedf used greatest stack depth: 4516 bytes left

sdcfedf - this is the analysis program.
All these messages came before the above measurements took place.

* Bruno Raoult (braoult@example.com) [100131 09:00]:
> Second *if*: Maybe you may know the filenames at first? Are filenames
> date- based, or something you could compute?

unfortunately, it is not the case. But this is surely one nice idea
which was not noticed before, I will make a note :)  Thank you.


Best regards
michal


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links