Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] "How to"



On Mon, May 12, 2014 at 4:56 AM, Kalin KOZHUHAROV <me.kalin@example.com> wrote:
>
> On Mon, May 12, 2014 at 1:02 AM, Bruno Raoult <braoult@example.com> wrote:
> > Following our discussions, and different solutions to one problem.
> >
> > 1- You have 10,000 files, and you want to find duplicates. Sometimes, 1 file
> > changes, or you add/remove one, so you want to find the changes quickly (let
> > say daily). How?
> >
> rsync -HavPS src/ dst/ --dry-run

In no way rsync will find dups.

> > 2- These files have "meta-information" inside (let say date/time), that you
> > can extract. how would you do?
> >
> depends on the file format; use the command below followed bu
> grep/perl with regex matching your dateformat:
> * generic - strings
> * multimedia - exiftool or exiv2
> * most any executable - objdump
> * many (incl. office) - file

My question was not correct: Let say you got this meta-information.
How would you
change your answer to previous question?

I need ti review my English, sorry everybody :)

> And add another one:
> 5. You have two filesystems (say ext4) with large number of files
> (>10K), some of them are big (>100G), some of them are big and sparse
> (>100MB disk size, >1TB real size) (think two 4TB SATA drives).
> At one point they are synced (rsync -HavPS /mnt/t1/ /mnt/t2/). Later
> you move files around on the fisrt drive (using mv or rename) and
> change/add/delete some small files (e.g. notes).
> What is an efficient way to sync back first drive to the second
> (assume second has not changed since last sync)?

For me, it seems difficult. You change a filename, and move it to another file
system. References are lost (except time).
Maybe are we coming back on a dups findings?
The sparse notion is another question, I would suggest to answer your question
first, without this notion - which is very specific to file systems.

> rsync will transfer a lot of extra files (e.g. if `mv /mnt/t1/dir1/A
> /mnt/t1/dir2/B` is issued, it will transfer the B and then delete A).
> using MD5 to track all changes is futile, it means calculating MD5
> over many TB each time.
> Keeping a journal of some kind may be possible, but how?

Calculating an MD5 is not futile. It you believe timestamps, only a few files
would be read again.
Therefore, if you have your limited list of files, with their MD5, it
is possible
to launch a rsync on them. By the way, I am not sure a "mv" between filesystems
will keep your "holes" in files. Just this question alone is interesting.

> (This is real work in forensics; my solution is to keep work log (of
> mv and rename commands), then manually execute it on the second fs,
> then rsync -HavPS as a check, but that is teduous)..

If we like to go further, "cp + rm" is also an alternative. Keeping a
log of "mv" is not playable.
If you use another tool, to move your file (a graphical file manager),
it will not work too.

I have no idea on how to track a file, if the first instance is lost.
I believe it is impossible.
Or having huge system logging, but I am sure this is not what you want.

br.



-- 
2 + 2 = 5, for very large values of 2.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links