Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Limits on file numbers in sort -m
- Date: Wed, 28 May 2014 12:55:57 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] Limits on file numbers in sort -m
- References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com>
黒鉄章 writes: > A small precursor to consider is if the filename expansion > (i.e. from *.interim to all the separate files) will exceed the > size of ARG_MAX. On my system you'd be OK (it's ~2M, i.e. more > than ~10k * 20 chars = ~200k) There are shell limits as well, even if ARG_MAX is huge Jim probably wants to use xargs. > > I'm gearing up for a merging of a very large number of > > sorted text files(*). Does anyone know if there is an upper > > limit on how many sorted files can be merged using something > > like: "sort -m *.interim > final". I don't know about upper limits, but you might consider whether you wouldn't get much better performance from a multipass approach. > > Also, is it worth fiddling with the "--batch-size=NMERGE" option? Pretty much what I had in mind. Specifically, assuming 100-byte lines, merging 10 files at a time means 4GB in the first pass, comfortably fitting in your memory and allowing very efficient I/O. I'll bet that this is a big win (on the first pass only). On later passes, the performance analysis is non-trivial, but the I/O efficiency of having a big buffer for each file in the batch may outweigh the additional passes. Do you expect the output file to be ~= 40x10^9 lines!? Or is some uniquification going to be applied? If so, I suspect that interleaving merge and uniquification passes will be a lot faster. For quad core, see the --parallel option. This is better documented in the Info manual for coreutils than in the man page. Steve
- Follow-Ups:
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- References:
- [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Limits on file numbers in sort -m
- Next by Date: Re: [tlug] Limits on file numbers in sort -m
- Previous by thread: Re: [tlug] Limits on file numbers in sort -m
- Next by thread: Re: [tlug] Limits on file numbers in sort -m
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links