Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Limits on file numbers in sort -m
- Date: Fri, 6 Jun 2014 10:35:26 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] Limits on file numbers in sort -m
- References: <CABHGxq7jYkDDLkF8uzzNK8WeU+37t1wgpVhk6VD2HQKyEi7wBw@mail.gmail.com> <CAJMSLH618MfmhL9ufAOfLXxw52i4STpF8dsc_+xe-2GRB3JM8g@mail.gmail.com> <87bnui8sky.fsf@uwakimon.sk.tsukuba.ac.jp> <CABHGxq4NEBMVR8jndiEvcgsGkc_B0f-qcrs2sFjqaAdWH3n9sw@mail.gmail.com> <CAJMSLH6SdSUmvHsjmZBZP-g1graNuPV51vdwLzpPf7ipmz7+zA@mail.gmail.com> <CABHGxq7eCk9Pk1JtNrZuqK_8yv4bt7ftoWwyXqf5P+GKYQH=5w@mail.gmail.com> <87sins7mhy.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2b6XyFNsFhDbK+ktgWk0cE5Lzfv9OrhimBH8RyN78yzLQ@mail.gmail.com> <87d2ew76yd.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2Y2vaH06nJyt25uREjCT9RELoTnfwDpeXX5Z97W45oZUQ@mail.gmail.com> <5387D422.2070302@extellisys.com> <CABHGxq65zdPuC1dRof0_KjmEzuNFX2GKS3JpjJM7T-U=2PH2Tw@mail.gmail.com> <CAJA1Y2YvJikn77rv3EznP=UC4nr8rT4L07RauCN7Kv3tYqD5Ew@mail.gmail.com> <538AC4BF.3050903@extellisys.com>
On Sun, Jun 1, 2014 at 8:14 AM, Travis Cardwell <travis.cardwell@example.com> wrote:
I am not convinced that this is an issue, as --batch-size can be used to
specify how many files are opened at once.
I am not sure to understand the algo in this case (merge). Let say max input is 3, and one of them is closed (a new file opened). No way to compare the lines to old ones, without starting a new search (beside keeping a lot in memory - where we can find limits - which is surely not what people want for a simple merge). Or maybe sort just opens/closes files all the time.
> - if you don't care CPU processing limit, a small script and a small DB canThe time requirements for this is O(N log N). You no longer need to keep
> do everything (sqlite, etc...). It could be
> expensive, but not so much, if the insert script makes a "+1" to a given
> key on insert, and real-time update disabled.
> You will not even need to keep the original files sorted.
input files sorted, but you gain nothing if they already are sorted.You assume a DB algorithm, that you should not, "a priori" (a tree is not a hash, etc...).
I usually use databases for tasks such as this one, btw. :)Well. We are 2 at least.
> You will get your> output (key/number) immediately.A select (on an indexed column) is O(log N), not immediate (O(1)).
A perfect hash would give an O(1). With a finite number of keys, and an infinite DB size. Again, your O(log N) is only valid for a specific DB.
In general, however, I find that shell scripts require more maintenance
than scripts/programs that are written in a more capable language.I dont know: If a shell can do a task, I don't see where Java or C or C++ could be easier to maintain...
br.
--
2 + 2 = 5, for very large values of 2.
- References:
- Re: [tlug] Limits on file numbers in sort -m
- From: Travis Cardwell
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Poll: OpenOffice or LibreOffice?
- Next by Date: [tlug] 3d printing
- Previous by thread: Re: [tlug] Limits on file numbers in sort -m
- Next by thread: [tlug] Places where to apply to for a technical internship?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links