Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Unzipping archives with japanese filenames in an unknown encoding in a smarter way?
- Date: Fri, 8 Feb 2019 10:21:12 +0100
- From: lists@example.com
- Subject: Re: [tlug] Unzipping archives with japanese filenames in an unknown encoding in a smarter way?
- References: <CACX149=m4MviLjqcMeo26UB8aCqmTm2=7ND_81atF8T=oABK4Q@mail.gmail.com>
- User-agent: Mutt/1.11.3 (2019-02-01)
On Fri, Feb 08, 2019 at 04:37:12PM +0900, Claus Aranha wrote: > Hello! > > What is the best way to unzip a file that may have filenames in > Japanese in an arbitrary encoding and avoid getting mojibake? > > I can use -O on unzip to tell what encoding I want (UTF-8, EUC, > Shift_JIS, Windows_31J (???)) but trying the different encodings until > finding one that works just seems inefficient. > > Is there a better way? As far as I understand, the -O switch does only specify which encoding the file names inside the ZIP file are read as; the output file name encoding will always be the one your system uses. So it's a conversion from an unknown encoding and the tool should guess the source file name encoding automatically. (The -O switch is an Ubuntu patch BTW. Original versions of unzip don't have it.) Guessing the text encoding of an unknown byte sequence is guesswork and can only rely on heuristics, like the chardet algorithm in libuchardet (if the source encoding deviates from a standard). I don't know specifically if any ZIP program supports sniffing source file name encodings; unzip likely doesn't. If you find a program that does that'd be great. Otherwise, I could imagine a wrapper written in e.g. Python that opens the ZIP file, goes over all file names in it and applies an encoding detection routine to each (may yield different results across multiple files) and then calls unzip with the correct encoding, or extracts files one by one.
- References:
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Why change a linux server's locale?
- Next by Date: Re: [tlug] Unzipping archives with japanese filenames in an unknown encoding in a smarter way?
- Previous by thread: [tlug] Unzipping archives with japanese filenames in an unknown encoding in a smarter way?
- Next by thread: Re: [tlug] Unzipping archives with japanese filenames in an unknown encoding in a smarter way?
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links