Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Poll: OpenOffice or LibreOffice?
- Date: Sun, 18 May 2014 15:17:32 +0200
- From: Bruno Raoult <braoult@example.com>
- Subject: Re: [tlug] Poll: OpenOffice or LibreOffice?
- References: <87iop8erkb.fsf@uwakimon.sk.tsukuba.ac.jp> <5374010A.6060509@extellisys.com> <87a9ajenxw.fsf@uwakimon.sk.tsukuba.ac.jp> <5375594E.7030000@extellisys.com> <20140516204331.33cadc3a@syd.sandslott.intra> <5376970A.1080400@extellisys.com> <CAAhy3dt3RCrs4+wLs1Luzp1d2EfCwn17Lx_r5R8m9yofwiWfpA@mail.gmail.com> <87k39kfsa0.fsf@uwakimon.sk.tsukuba.ac.jp> <CAAhy3dsjZ1R1HzZzzRUtO-zKCm08Hv3zFGCr=ztD3eHGZBBJwg@mail.gmail.com> <87d2fbxzar.fsf@uwakimon.sk.tsukuba.ac.jp> <CAJA1Y2aBi-47nwj35G3QLt+CR0_A8L0sOq_aMO0pwbmO64GmDQ@mail.gmail.com> <537892F2.9010700@dcook.org> <CAJA1Y2ai7EMqVUGxdh7LHsKiLFt-0OvK-Qndi3zgA7oT0CBcmQ@mail.gmail.com> <5378A901.1010007@dcook.org>
On Sun, May 18, 2014 at 2:35 PM, Darren Cook <darren@example.com> wrote: >>> Git hashes each file (that exists now, or has existed in the past), and >>> creates one file per hash code. >> >> To be clear: there is no copy of the actual files, right? > > Unless I've misunderstood, there is a copy. (It might be zipped, in > which case my below numbers are off.) Otherwise when you delete your > file, going back in the git history wouldn't be able to recover it. > >> Practical example: someone has a disk 90% full of music, pics, and >> video. No space >> anywhere else. Will git need another disk, just to find dups? > > Git is the wrong tool for that (IMHO). Go straight to md5 hashes. > >>> (So it can detect duplicates in the directory tree; but you could >>> achieve the same by just writing a script to run md5sum on every file.) >> >> This was my initial question (and my solution). I just wondered if git >> could do the same >> with 2 lines instead of my 100 :) > > I bet someone with more bash skills than me could make it a one-liner. > Something like: > find . | xargs md5sum | sort | uniq This line won't work, but this is not the point, and it is the idea that I used first: Having a checksum of every file, then check the non unique ones. As checksum is *very* expensive, I went on on keeping the checksum date as well as the checksum in a text file. I wanted also to be able to add an external dir (I mean files outside the initial directory tree), and also to find same file names (which can sometimes help to find dups; especially true for pictures, at least for me). Last version was to use a small DB, where I expected to add more information, such as some meta-data, so that I could (maybe) be able to detect basic changes, such as rotations. In that case, the checksum would have been the "data" part only (excluding meta-information). I never finished it, so it is now only a simple checksum database, but very easy to update and search (by filename, md5, etc...). > 112MB in total, and .git is 26MB of that. I see roughly 30MB is being > excluded by .gitignore. So: > 56MB real files gives a 26MB git directory. > > Much less than by 2.5 ratio. This could well be compression. Or me > misunderstanding how git works. Thanks. This was my guess. Therefore git will not be appropriate as just a dup search. I will go on with my own script. I could send it to the list, but it is really not ready yet, and I am not sure it could have any interest for anybody here... At least in its actual form. br. -- 2 + 2 = 5, for very large values of 2.
- References:
- [tlug] Poll: OpenOffice or LibreOffice?
- From: Stephen J. Turnbull
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Travis Cardwell
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Stephen J. Turnbull
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Travis Cardwell
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Benjamin Tayehanpour
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Travis Cardwell
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Raymond Wan
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Stephen J. Turnbull
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Raymond Wan
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Stephen J. Turnbull
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Bruno Raoult
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Darren Cook
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Bruno Raoult
- Re: [tlug] Poll: OpenOffice or LibreOffice?
- From: Darren Cook
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Poll: OpenOffice or LibreOffice?
- Next by Date: Re: [tlug] Poll: OpenOffice or LibreOffice?
- Previous by thread: Re: [tlug] Poll: OpenOffice or LibreOffice?
- Next by thread: Re: [tlug] Poll: OpenOffice or LibreOffice?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links