Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] "How to"
- Date: Mon, 12 May 2014 21:17:07 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] "How to"
- References: <CAJA1Y2bTWLWhb0tcuZyeJQDXtAXsGRdyUw_T_Ft7sZ_W6nXhLQ@mail.gmail.com> <CAKXLc7fOK94iWsRP7QkfjaqotYRXfgRSQRtbMeRCT80M4_-b1w@mail.gmail.com> <87sioffz3p.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2ZzArvOAstFK2tQE5yo_dgK4TE72GudigW_6XksB_v60Q@mail.gmail.com> <87oaz3fkfm.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2YibQ+-R4_6G3892=aGwj2SB+1r=5q6WDuqnEDwmzuaUg@mail.gmail.com> <87iopbf501.fsf@uwakimon.sk.tsukuba.ac.jp>
On Mon, May 12, 2014 at 4:25 PM, Stephen J. Turnbull <stephen@example.com> wrote:
> You offered a solution (that I did not test) using git. I am sure"Best"? Depends on lots of things.
> readers will propose alternatives. And this was the target of the
> question: which solution would be the best for such a requisite?
My idea (beside last meeting) was to get a few solutions. git is one,only one. When we have a few, we just can try all of them and discuss.
"Good"? Sure -- git is a highly optimized application for tracking
and comparing the contents of files. I happen to know a bit about
extracting the information you want from a git object database. git
would be a lot more reliable than coding the algorithms myself.So, let's compare the performance of your line against others. I won'ttell anything about git itself, I don't know how it works internally. However,
I believe there is no magic there (it is so difficult to compare 2 files,so finding 2 identical files within 10,000 is not as easy as running "git").The point was to be fast (my initial question). I have a 4,000+ directory
that I could use for testing different solutions. If you could provide mea full script, I will be happy tu run it and give back the result to the list,with other proposals.I suggest the following syntax for everybody:$ the-perfect-script-to-find-dups [-c] [-d db] [-x ext] [-s size] dirwith:-c: create or init the DB (if any DB).-d DB: database name, if any in your solution. Default should be: $HOME/the-perfect-script-to-find-dup.DB.It could be a directory, if your solution implies a directory.-s size: the minimal size for a file to be considered (the reason for this is that we don't want
to consider small files). Default should be zero (all files).-x ext: Consider only files with "ext" extension. I suggest ext to be case-insensitive (mp3 = MP3). Default should beanything (no filter).dir: the directory where we want to find the duplicates.My test will be:- to run your script initially, and time it.- to copy a file in the subtree, and time the command. Check also that it was found.
- other tests: rename a file to an already existing one, move the old one to a new name or directory, etc...Please let me know if you have more tests in mind.The target is to have the fastest and realiable way to find the duplicates. The initial round would not be so important, if your
choice is to have a DB, only the "normal" run timing is important.
> Let say another way:
What makes you think I didn't understand the first time?Nothing. I just made this remark after the few answers we got; I believed I was not clear.br.
--
2 + 2 = 5, for very large values of 2.
- Follow-Ups:
- Re: [tlug] "How to"
- From: Daniel A. Ramaley
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- References:
- [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Kalin KOZHUHAROV
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- Re: [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
- Re: [tlug] "How to"
- From: Bruno Raoult
- Re: [tlug] "How to"
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] "How to"
- Next by Date: Re: [tlug] "How to"
- Previous by thread: Re: [tlug] "How to"
- Next by thread: Re: [tlug] "How to"
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links