TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Limits on file numbers in sort -m

Date: Thu, 29 May 2014 15:32:03 +1000

From: Jim Breen <jimbreen@example.com>

Subject: Re: [tlug] Limits on file numbers in sort -m

References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com> <87bnui8sky.fsf@uwakimon.sk.tsukuba.ac.jp> <CABHGxq4NEBMVR8jndiEvcgsGkc_B0f-qcrs2sFjqaAdWH3n9sw@mail.gmail.com> <CAJMSLH6SdSUmvHsjmZBZP-g1graNuPV51vdwLzpPf7ipmz7+zA@mail.gmail.com>
On 28 May 2014 15:59, 黒鉄章 <akira.kurogane@example.com> wrote:
> For each input file the sort process has open the OS will buffer a memory
> page or two for each one (at least). 4k is usual mem page size I believe.
>
> 4k * 10k = 40M doesn't sound bad at all. But if read-ahead buffering is
> putting a lot more than a couple of pages per file in memory that will be
> that many times larger. Actually I would expect this to be happening but
> would have faith in the OS to limit itself to avoid using swap.

That's pretty much my understanding of it. It would be ultimate silliness to
have read-only input pages end up replicated in swap.

> Regarding the count of occurrences you could pipe the "sort -m ...." into
> "uniq -c". I've always been annoyed by the format of uniq (a space-padded,
> fixed-width count as the first column) but if you can live with that you'll
> be getting to what you want quicker. The pipe to uniq will consume it's
> input buffer very quickly so it's not going to be the case that all of the
> output of sort must stay in memory as long as the process is running. Also
> if duplicates are common, your final output file saved to disk will be
> usefully smaller.

In any case the output from "uniq -c" is not what I want, so since I'd need to
reformat it it's easier to use my own utility. It also give me the
option of turning

this  3
this  4

into

this  7

which I can't do with "uniq -c".

Cheers

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
Follow-Ups:

Re: [tlug] Limits on file numbers in sort -m
From: Bruno Raoult

Re: [tlug] Limits on file numbers in sort -m
From: Stephen J. Turnbull

References:

[tlug] Limits on file numbers in sort -m
From: Jim Breen

Re: [tlug] Limits on file numbers in sort -m
From: 黒鉄章

Re: [tlug] Limits on file numbers in sort -m
From: Stephen J. Turnbull

Re: [tlug] Limits on file numbers in sort -m
From: Jim Breen

Re: [tlug] Limits on file numbers in sort -m
From: 黒鉄章

Prev by Date: Re: [tlug] Limits on file numbers in sort -m

Next by Date: Re: [tlug] Limits on file numbers in sort -m

Previous by thread: Re: [tlug] Limits on file numbers in sort -m

Next by thread: Re: [tlug] Limits on file numbers in sort -m

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links