
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Limits on file numbers in sort -m
On 2014年05月30日 05:21, Bruno Raoult wrote:
> So "uniq *" was able to read files, but "sort -m *" was not, right?
> And a "uniq | sort | uniq" is not possible???
>
> I am stupid, I dont understand the issue at all :-(, and I would like
> to understand clearly, with output of commands if possible...
I can be very specific using types... A strongly-typed sort command would
take a list of orderable elements and return a list of the same (but in
sorted order):
sort :: Ord a => [a] -> [a]
A strongly-typed uniq command (as used) would take a (sorted) list of
elements which can be compared for equality and return a list of elements
with associated counts:
uniq :: Eq a => [a] -> [(a, Int)]
In a strongly-typed shell, `uniq | sort` (`sort . uniq` in function
composition syntax) would have type:
(sort . unq) :: (Eq a, Ord a) => [a] -> [(a, Int)]
`uniq | sort | uniq` would therefore have type:
(uniq . sort . uniq) :: (Eq a, Ord a) => [a] -> [((a, Int), Int)]
As you can see from the return value ([((a, Int), Int)]), the result is a
list of element+count pairs (from the first uniq) with associated counts
(from the second uniq). Our shell is not strongly-typed, but the result
is essentially the same when passing around strings. It does not meet the
requirements. [1]
What is needed is a command that sums the counts of equal elements when
merging. In the style of a merge sort:
merge :: Eq a => [(a, Int)] -> [(a, Int)] -> [(a, Int)]
The `sort -m` command does not sum counts, which is why Jim said that he
will need to use external software to do so.
Cheers,
Travis
[1] Check the output of the following commands:
$ sort -R /usr/share/dict/words | head -n 30000 | sort > words.1
$ sort -R /usr/share/dict/words | head -n 30000 | sort > words.2
$ sort -R /usr/share/dict/words | head -n 30000 | sort > words.3
$ sort -R /usr/share/dict/words | head -n 30000 | sort > words.4
$ sort -m words.1 words.2 | uniq -c > words.12
$ sort -m words.3 words.4 | uniq -c > words.34
$ sort -m words.12 words.34 | uniq -c > words.1234
Home |
Main Index |
Thread Index