Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] "How to"
- Date: Tue, 13 May 2014 00:37:05 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] "How to"
- References: <CAJA1Y2bTWLWhb0tcuZyeJQDXtAXsGRdyUw_T_Ft7sZ_W6nXhLQ@mail.gmail.com> <CAKXLc7fOK94iWsRP7QkfjaqotYRXfgRSQRtbMeRCT80M4_-b1w@mail.gmail.com>
On Mon, May 12, 2014 at 4:56 AM, Kalin KOZHUHAROV <me.kalin@example.com> wrote: > > On Mon, May 12, 2014 at 1:02 AM, Bruno Raoult <braoult@example.com> wrote: > > Following our discussions, and different solutions to one problem. > > > > 1- You have 10,000 files, and you want to find duplicates. Sometimes, 1 file > > changes, or you add/remove one, so you want to find the changes quickly (let > > say daily). How? > > > rsync -HavPS src/ dst/ --dry-run In no way rsync will find dups. > > 2- These files have "meta-information" inside (let say date/time), that you > > can extract. how would you do? > > > depends on the file format; use the command below followed bu > grep/perl with regex matching your dateformat: > * generic - strings > * multimedia - exiftool or exiv2 > * most any executable - objdump > * many (incl. office) - file My question was not correct: Let say you got this meta-information. How would you change your answer to previous question? I need ti review my English, sorry everybody :) > And add another one: > 5. You have two filesystems (say ext4) with large number of files > (>10K), some of them are big (>100G), some of them are big and sparse > (>100MB disk size, >1TB real size) (think two 4TB SATA drives). > At one point they are synced (rsync -HavPS /mnt/t1/ /mnt/t2/). Later > you move files around on the fisrt drive (using mv or rename) and > change/add/delete some small files (e.g. notes). > What is an efficient way to sync back first drive to the second > (assume second has not changed since last sync)? For me, it seems difficult. You change a filename, and move it to another file system. References are lost (except time). Maybe are we coming back on a dups findings? The sparse notion is another question, I would suggest to answer your question first, without this notion - which is very specific to file systems. > rsync will transfer a lot of extra files (e.g. if `mv /mnt/t1/dir1/A > /mnt/t1/dir2/B` is issued, it will transfer the B and then delete A). > using MD5 to track all changes is futile, it means calculating MD5 > over many TB each time. > Keeping a journal of some kind may be possible, but how? Calculating an MD5 is not futile. It you believe timestamps, only a few files would be read again. Therefore, if you have your limited list of files, with their MD5, it is possible to launch a rsync on them. By the way, I am not sure a "mv" between filesystems will keep your "holes" in files. Just this question alone is interesting. > (This is real work in forensics; my solution is to keep work log (of > mv and rename commands), then manually execute it on the second fs, > then rsync -HavPS as a check, but that is teduous).. If we like to go further, "cp + rm" is also an alternative. Keeping a log of "mv" is not playable. If you use another tool, to move your file (a graphical file manager), it will not work too. I have no idea on how to track a file, if the first instance is lost. I believe it is impossible. Or having huge system logging, but I am sure this is not what you want. br. -- 2 + 2 = 5, for very large values of 2.
- References:
- [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Kalin KOZHUHAROV
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] "How to"
- Next by Date: Re: [tlug] "How to"
- Previous by thread: Re: [tlug] "How to"
- Next by thread: Re: [tlug] "How to"
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links