
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Limits on file numbers in sort -m
On 28 May 2014 15:59, 黒鉄章 <akira.kurogane@example.com> wrote:
> For each input file the sort process has open the OS will buffer a memory
> page or two for each one (at least). 4k is usual mem page size I believe.
>
> 4k * 10k = 40M doesn't sound bad at all. But if read-ahead buffering is
> putting a lot more than a couple of pages per file in memory that will be
> that many times larger. Actually I would expect this to be happening but
> would have faith in the OS to limit itself to avoid using swap.
That's pretty much my understanding of it. It would be ultimate silliness to
have read-only input pages end up replicated in swap.
> Regarding the count of occurrences you could pipe the "sort -m ...." into
> "uniq -c". I've always been annoyed by the format of uniq (a space-padded,
> fixed-width count as the first column) but if you can live with that you'll
> be getting to what you want quicker. The pipe to uniq will consume it's
> input buffer very quickly so it's not going to be the case that all of the
> output of sort must stay in memory as long as the process is running. Also
> if duplicates are common, your final output file saved to disk will be
> usefully smaller.
In any case the output from "uniq -c" is not what I want, so since I'd need to
reformat it it's easier to use my own utility. It also give me the
option of turning
this 3
this 4
into
this 7
which I can't do with "uniq -c".
Cheers
Jim
--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
Home |
Main Index |
Thread Index