TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Seeking recommendations for file consolidation

Date: Fri, 18 Aug 2006 01:47:24 +0300

From: "romans malinovskis" <romaninsh@example.com>

Subject: Re: [tlug] Seeking recommendations for file consolidation

References: <44E44C72.4050509@example.com>
Once all the data is in one place, I hoped to find a way I can weed out
duplicates and be left with one set of just the most recent versions of
unique files.
Hello again, Dave M

This sounded challenging enough, so i played a bit. Basically my plan
is first to get a list of all files and calculate their MD5 checksums.
Then list is sorted and we go through looking for duplicates. Once you
will get handy with unix textutils you can customize it the way you
want.

So first you need a list of files and checksums:

find . -type f | while read fn; do echo `md5sum "$fn"` $fn; done |
sort  > all_files

Next just gives an idea what you could do with that:

cat all_files | sed 's/ /|/' | awk -F \| '{ if (c==$1) print $1 " DUP
" $2 ; else print $1 " " $2; c=$1 }' | head

Here I also use sed to replace first space in "all_files" with a pipe
character. I just don't want awk to think that space is column
separator (some files have space in their names)

I hope it was useful.

romans
http://grr.void.lv/cv
References:

[tlug] Seeking recommendations for file consolidation
From: Dave M G

Prev by Date: Re: [tlug] How to figure out which files are duplicates

Next by Date: [tlug] New Custom Server for Linux - Advice Needed

Previous by thread: [tlug] How to figure out which files are duplicates . . . . . . . . . (was Re: Seeking recommendations for file consolidation)

Next by thread: Re: [tlug] Seeking recommendations for file consolidation

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links