Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Seeking recommendations for file consolidation
- Date: Fri, 18 Aug 2006 01:47:24 +0300
- From: "romans malinovskis" <romaninsh@example.com>
- Subject: Re: [tlug] Seeking recommendations for file consolidation
- References: <44E44C72.4050509@example.com>
Once all the data is in one place, I hoped to find a way I can weed out duplicates and be left with one set of just the most recent versions of unique files.Hello again, Dave M This sounded challenging enough, so i played a bit. Basically my plan is first to get a list of all files and calculate their MD5 checksums. Then list is sorted and we go through looking for duplicates. Once you will get handy with unix textutils you can customize it the way you want. So first you need a list of files and checksums: find . -type f | while read fn; do echo `md5sum "$fn"` $fn; done | sort > all_files Next just gives an idea what you could do with that: cat all_files | sed 's/ /|/' | awk -F \| '{ if (c==$1) print $1 " DUP " $2 ; else print $1 " " $2; c=$1 }' | head Here I also use sed to replace first space in "all_files" with a pipe character. I just don't want awk to think that space is column separator (some files have space in their names) I hope it was useful. romans http://grr.void.lv/cv
- References:
- [tlug] Seeking recommendations for file consolidation
- From: Dave M G
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] How to figure out which files are duplicates
- Next by Date: [tlug] New Custom Server for Linux - Advice Needed
- Previous by thread: [tlug] How to figure out which files are duplicates . . . . . . . . . (was Re: Seeking recommendations for file consolidation)
- Next by thread: Re: [tlug] Seeking recommendations for file consolidation
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links