Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] "How to"



On Mon, May 12, 2014 at 1:02 AM, Bruno Raoult <braoult@example.com> wrote:
> Following our discussions, and different solutions to one problem.
>
> 1- You have 10,000 files, and you want to find duplicates. Sometimes, 1 file
> changes, or you add/remove one, so you want to find the changes quickly (let
> say daily). How?
>
rsync -HavPS src/ dst/ --dry-run

If you don't trust timestamps, add -c
If you want more robust and faster solution, go the FIM-way, use MD5
hashlist that you check against.


> 2- These files have "meta-information" inside (let say date/time), that you
> can extract. how would you do?
>
depends on the file format; use the command below followed bu
grep/perl with regex matching your dateformat:
* generic - strings
* multimedia - exiftool or exiv2
* most any executable - objdump
* many (incl. office) - file

> 3- How would you *copy* your mp3 (real files) playlists from an m3u file to
> another place, with no subtree in destination? Example: src/a/b/c/d.mp3 =>
> dst/a-b-c-d.mp3. m3u is a common file format for music playing lists.
> 4- If you consider the final target being /a/b/c/d, how would you create the
> new m3u file from the original one? Example: you answered question 3, but
> the target is for another device/machine, therefore different path.
>
I'll leave those for somebody else.

And add another one:
5. You have two filesystems (say ext4) with large number of files
(>10K), some of them are big (>100G), some of them are big and sparse
(>100MB disk size, >1TB real size) (think two 4TB SATA drives).
At one point they are synced (rsync -HavPS /mnt/t1/ /mnt/t2/). Later
you move files around on the fisrt drive (using mv or rename) and
change/add/delete some small files (e.g. notes).
What is an efficient way to sync back first drive to the second
(assume second has not changed since last sync)?

rsync will transfer a lot of extra files (e.g. if `mv /mnt/t1/dir1/A
/mnt/t1/dir2/B` is issued, it will transfer the B and then delete A).
using MD5 to track all changes is futile, it means calculating MD5
over many TB each time.
Keeping a journal of some kind may be possible, but how?

(This is real work in forensics; my solution is to keep work log (of
mv and rename commands), then manually execute it on the second fs,
then rsync -HavPS as a check, but that is teduous)..

Kalin.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links