Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Limits on file numbers in sort -m
- Date: Thu, 29 May 2014 15:32:03 +1000
- From: Jim Breen <jimbreen@example.com>
- Subject: Re: [tlug] Limits on file numbers in sort -m
- References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com> <87bnui8sky.fsf@uwakimon.sk.tsukuba.ac.jp> <CABHGxq4NEBMVR8jndiEvcgsGkc_B0f-qcrs2sFjqaAdWH3n9sw@mail.gmail.com> <CAJMSLH6SdSUmvHsjmZBZP-g1graNuPV51vdwLzpPf7ipmz7+zA@mail.gmail.com>
On 28 May 2014 15:59, 黒鉄章 <akira.kurogane@example.com> wrote: > For each input file the sort process has open the OS will buffer a memory > page or two for each one (at least). 4k is usual mem page size I believe. > > 4k * 10k = 40M doesn't sound bad at all. But if read-ahead buffering is > putting a lot more than a couple of pages per file in memory that will be > that many times larger. Actually I would expect this to be happening but > would have faith in the OS to limit itself to avoid using swap. That's pretty much my understanding of it. It would be ultimate silliness to have read-only input pages end up replicated in swap. > Regarding the count of occurrences you could pipe the "sort -m ...." into > "uniq -c". I've always been annoyed by the format of uniq (a space-padded, > fixed-width count as the first column) but if you can live with that you'll > be getting to what you want quicker. The pipe to uniq will consume it's > input buffer very quickly so it's not going to be the case that all of the > output of sort must stay in memory as long as the process is running. Also > if duplicates are common, your final output file saved to disk will be > usefully smaller. In any case the output from "uniq -c" is not what I want, so since I'd need to reformat it it's easier to use my own utility. It also give me the option of turning this 3 this 4 into this 7 which I can't do with "uniq -c". Cheers Jim -- Jim Breen Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
- Follow-Ups:
- Re: [tlug] Limits on file numbers in sort -m
- From: Bruno Raoult
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- References:
- [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
- Re: [tlug] Limits on file numbers in sort -m
- From: Stephen J. Turnbull
- Re: [tlug] Limits on file numbers in sort -m
- From: Jim Breen
- Re: [tlug] Limits on file numbers in sort -m
- From: 黒鉄章
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Limits on file numbers in sort -m
- Next by Date: Re: [tlug] Limits on file numbers in sort -m
- Previous by thread: Re: [tlug] Limits on file numbers in sort -m
- Next by thread: Re: [tlug] Limits on file numbers in sort -m
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links