Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] "How to"
- Date: Mon, 12 May 2014 17:52:29 +0900
- From: "Stephen J. Turnbull" <stephen@example.com>
- Subject: Re: [tlug] "How to"
- References: <CAJA1Y2bTWLWhb0tcuZyeJQDXtAXsGRdyUw_T_Ft7sZ_W6nXhLQ@mail.gmail.com> <CAKXLc7fOK94iWsRP7QkfjaqotYRXfgRSQRtbMeRCT80M4_-b1w@mail.gmail.com> <87sioffz3p.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2ZzArvOAstFK2tQE5yo_dgK4TE72GudigW_6XksB_v60Q@mail.gmail.com>
Bruno Raoult <braoult@example.com> writes: > On Mon, May 12, 2014 at 5:35 AM, Stephen J. Turnbull <stephen@example.com>wrote: >>> 1- You have 10,000 files, and you want to find >>> duplicates. Sometimes, 1 file changes, or you add/remove one, so >>> you want to find the changes quickly (let say daily). How? > > git init; git add .; git commit; while true; do git status; sleep 86400; > done > I am not sure tu understand (or maybe my question was not > clear). Let say you have ./a/b/c/d/file1 and ./a/b/z/file2 in the > tree. They are binary the same files. My question was to find them. For two files in the same directory that have the same content but different names, git cat-file tree `git cat-file commit HEAD | grep tree | cut -b 5-` \ | sort -f 3 | uniq -D -w 52 (untested; probably requires GNU uniq). To handle recursion is (recursively ;-) left as an exercise for the reader. > So we extracted the data, piped it, and saved in a file. Then? What > about the next day, when you want to refresh? git ls-files --modified | xargs metadata-extractor-and-updater If you need to do this in real time, it's a difficult problem. If you only need to do it occasionally, this is *exactly* the problem that Linus designed git to solve (except that Linus also needs to store the content; a modified git that never actually stores blobs would probably save you a lot of space!) Of course if (like Kalin) you're dealing with terabytes, this is still way slow (even if you can compare bytes on the order of once per CPU cycle, you're still talking about thousands of seconds). You really need to be able to ensure that files aren't changed behind your back, and some special handling for files >10GB would be needed. But for people dealing with files on the order of a CD or less, git should do the job quickly enough.
- Follow-Ups:
- Re: [tlug] "How to"
- From: Bruno Raoult
- References:
- [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Kalin KOZHUHAROV
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- Re: [tlug] "How to"
- From: Bruno Raoult
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] What's with this anti-Apple tirade? [was: 2014-05-10 Linux Quiz]
- Next by Date: Re: [tlug] What's with this anti-Apple tirade? [was: 2014-05-10 Linux Quiz]
- Previous by thread: Re: [tlug] "How to"
- Next by thread: Re: [tlug] "How to"
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links