Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Limits on file numbers in sort -m



Hi Jim.

    A small precursor to consider is if the filename expansion (i.e. from *.interim to all the separate files) will exceed the size of ARG_MAX. On my system you'd be OK (it's ~2M, i.e. more than ~10k * 20 chars = ~200k)

akira@akira-t3500-ub:~$ getconf ARG_MAX
2097152

    I am not aware of any filehandle limit that would affect the sort program, other than the general limits for any process. In this case there would an issue on my system (4096 hard limit for open files < ~10k):

akira@akira-t3500-ub:~$ cat /proc/self/limits | grep -P "^Limit|^Max open files"
Limit                     Soft Limit           Hard Limit           Units     
Max open files            1024                 4096                 files     

Cheers,

Akira



On Wed, May 28, 2014 at 9:48 AM, Jim Breen <jimbreen@example.com> wrote:
A real *n*x question for once.

I'm gearing up for a merging of a very large number of
sorted text files(*). Does anyone know if there is an upper
limit on how many sorted files can be merged using something
like: "sort -m *.interim > final".

Also, is it worth fiddling with the "--batch-size=NMERGE" option?
The system I'll be doing it on has masses of disk (10Tb), 8Gb of
RAM and is a quad processor.

TIA

Jim

(*) about 10k files, each about 4M lines of text.

--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University

--
To unsubscribe from this mailing list,
please see the instructions at http://lists.tlug.jp/list.html

The TLUG mailing list is hosted by ASAHI Net, provider of mobile and
fixed broadband Internet services to individuals and corporations.
Visit ASAHI Net's English-language Web page: http://asahi-net.jp/en/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links