Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Seeking recommendations for file consolidation



Dave M G writes:

 > There are so many of them now, though, that I don't need them all, and I 
 > know most of the files will be duplicates anyway. Not to mention a lot 
 > of junk that simply isn't needed at all anymore.

You come up with the neatest problems!

Here's a cute hack:  put the whole schmeer into a git repository.

That will automatically (1) compress the files and (2) content-index
them.  Two files which are byte-for-byte identical will become aliases
for the same object in the git database.  If you have lots of
duplicate files, this also speeds up a tree diff immensely.

Unfortunately, git is designed for *comparing a sequence of related
trees*, not for *identifying duplicates*.  However, for sufficient
quantities of pizza (in advance, hold the mayo) and beer (completion
bonus) I could probably be convinced to hack up a script to do the
identification of duplicates for you.

Since git's database is designed for tracking tree changes, you can
move stuff around to your heart's content without confusing it, too.

 > Now that my current computer has many gigabytes of free space, I'm
 > copying the contents of all the CD-ROMS to a directory on my hard
 > drive. Each CD-ROM's contents goes into it's own sub-directory to
 > prevent accidental over-writing.

A possible alternative for series of disks that probably cover
basically the same material, with mostly identical files from
generation to generation, would be to copy, git commit, copy, git
commit, etc.  

However, it's unlikely to do what you want unless you were extremely
systematic and consistent about your backup policy, and certainly
won't catch file renames.

 > Once all the data is in one place, I hoped to find a way I can weed
 > out duplicates and be left with one set of just the most recent
 > versions of unique files.

"Most recent."  Hm, that may take another iteration of pizza and
beer ... nah, that's pretty simple, too.  I think.  :-)

 > I also downloaded and ran Kompare. It says on their web site that it can 
 > recursively compare subdirectories. But I can't find any such feature in 
 > the interface.

`diff -rq dir-1 dir-2' will compare two directories, recursing into
subdirectories, announcing only which files with the same names
differ.  Kompare probably will do the same thing.

However, the comment above about "extreme system and coherence in
backups" applies here.  And diff -rq will be *very* slow, I bet
Kompare is too.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links