Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Seeking recommendations for file consolidation



Once all the data is in one place, I hoped to find a way I can weed out
duplicates and be left with one set of just the most recent versions of
unique files.

Hello again, Dave M

This sounded challenging enough, so i played a bit. Basically my plan
is first to get a list of all files and calculate their MD5 checksums.
Then list is sorted and we go through looking for duplicates. Once you
will get handy with unix textutils you can customize it the way you
want.

So first you need a list of files and checksums:

find . -type f | while read fn; do echo `md5sum "$fn"` $fn; done |
sort  > all_files

Next just gives an idea what you could do with that:

cat all_files | sed 's/ /|/' | awk -F \| '{ if (c==$1) print $1 " DUP
" $2 ; else print $1 " " $2; c=$1 }' | head

Here I also use sed to replace first space in "all_files" with a pipe
character. I just don't want awk to think that space is column
separator (some files have space in their names)

I hope it was useful.

romans
http://grr.void.lv/cv


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links