Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] how to tune reiser4 for millions of files?
- Date: Sun, 31 Jan 2010 16:06:18 +0900
- From: Curt Sampson <cjs@example.com>
- Subject: Re: [tlug] how to tune reiser4 for millions of files?
- References: <20100128073847.GH13095@example.com> <20100128095957.GB24344@example.com> <20100128132701.GI13095@example.com>
- User-agent: Mutt/1.5.18 (2008-05-17)
On 2010-01-28 14:27 +0100 (Thu), Michal Hajek wrote: > the analysis itself is not a problem at the moment. I believe that > by rewriting the program one can compute the whole thing in an hour or > so. Especially I believe one could employ cuda and nvidia card to get > even better result, since the thing is easily paralelizable (? not sure > about the correct English word). You're correct; the word is "parallelizable." That said, if it takes an hour to run, unless you're running it many times a day it would probably be a complete waste of time to do the extra work to use your NVidea card for it. But that's another topic. > My attention is more on the hw or system side of the problem. That is, > can I do something with the system (OS, hw..etc.) to speed things up? Yes. First step: forget about what filesystems you're using: you're attacking the difficult rather than the easy side of the problem. Your best option is to fix whatever's writing the data to use a single file, or a small number of files. By the way, you so far neglected to give us one of the most critical pieces of information here, which is the size of your data set, but knowing that it's 7-million-odd "small text files," I'll guess that they're say they're 1 KB each and you've got 7 GB of data. Things don't change that much if they're 10 KB each and you have 70 GB, and I'm guessing if you can process the whole data set "in an hour," it's not 700 GB, which would take more than twice that just to read from a disk in a straight serial read. (That 7 GB size, by the way, is what we classify in the database world as either "small" or "trivial,"; it's fits into well under half of the main memory in a modern $3000 low-end server.) Actually, I lied, that's not the most critical: it's really your access patterns (how you write and read the data that's the issue). For the smaller size (7GB) it's probably about how fast you can load it into main memory, and for the larger size (70GB) you'll be getting into disk access speed. The single best thing you can do is change the program generating these data to write everything to a single file, or a relatively small number of files. The second best thing is to change either your analyzer to read the files in directory order (as I said before, readdir()) if it reads the files only once and the you newfs that filesystem after, or to write an intermediate program that reads the files in directory order and rewrites them (to a separate drive) as one or a few large files in a more optimized format. If you're going to continue to play around with having lots of small files, and you're in the 70 GB range rather than the 700 GB range, don't bother mucking about with filesystems until you've put the whole thing on an SSD. Fourth, if you're going to persist in playing with filesystems here, keep in mind it no longer has anything to do with the performance of your application and you're just doing it for personal pleasure. You're in the position of the guy who wrote that wonderful 672-byte chess program that would run on a 1K Timex Sinclair[1]. People trying to improve that these days are not doing so because they just want a better chess program. [1]: http://users.ox.ac.uk/~uzdm0006/scans/1kchess/ cjs -- Curt Sampson <cjs@example.com> +81 90 7737 2974 http://www.starling-software.com The power of accurate observation is commonly called cynicism by those who have not got it. --George Bernard Shaw
- Follow-Ups:
- Re: [tlug] how to tune reiser4 for millions of files?
- From: Bruno Raoult
- References:
- [tlug] how to tune reiser4 for millions of files?
- From: Michal Hajek
- Re: [tlug] how to tune reiser4 for millions of files?
- From: Curt Sampson
- Re: [tlug] how to tune reiser4 for millions of files?
- From: Michal Hajek
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] how to tune reiser4 for millions of files?
- Next by Date: Re: [tlug] [announcement] nomikai Feb 12 (Fri)
- Previous by thread: Re: [tlug] how to tune reiser4 for millions of files?
- Next by thread: Re: [tlug] how to tune reiser4 for millions of files?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links