Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]



Speaking of Gentoo, its version of "unzip" comes with an optional "natspec" patch, which just works.

$ unzip --help | grep CHAR
  -O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives
  -I CHARSET  specify a character encoding for UNIX and other archives

The meaning of "-O" and "-I" is a bit of a mystery, but it one doesn't work, try the other.

E.g., take this file: https://github.com/Stuk/jszip/files/654150/jpnfile.zip

It's one of the problematic ones:

$ unzip -l jpnfile.zip 
Archive:  jpnfile.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  12-15-2016 16:32   jpnfile/index.html
        0  12-15-2016 16:32   jpnfile/ÉVé╡éóâeâLâXâg âhâLâàâüâôâg.txt
---------                     -------
        0                     2 files

So use "-O" to try a different encoding:

$ unzip -O sjis -l jpnfile.zip 
Archive:  jpnfile.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  12-15-2016 16:32   jpnfile/index.html
        0  12-15-2016 16:32   jpnfile/新しいテキスト ドキュメント.txt
---------                     -------
        0                     2 files

On Tue, Jun 26, 2018 at 7:40 PM Kalin KOZHUHAROV <me.kalin@example.com> wrote:
On Tue, Jun 26, 2018 at 5:05 AM, Stephen J. Turnbull
<turnbull.stephen.fw@example.com> wrote:
> I still occasionally get zip files with Shift JIS-encoded file names.
>
LoL, after many years of occasional need to wrestle those, I published this:

https://github.com/thinrope/gentoo-nifty-scripts/blob/master/usr/local/bin/jzip

(although it is somewhat Gentoo-related (paths), the script is mostly
standalone)

> I forget the exact syntax, but something like
>
> python -m zipfile --extract --filenameencoding=shift_jis bogus.zip
>
I think patching upstream (zip, p7zip) is still better, but haven't
had urge to do it.


Cheers,
Kalin.

--
To unsubscribe from this mailing list,
please see the instructions at http://lists.tlug.jp/list.html

The TLUG mailing list is hosted by ASAHI Net, provider of mobile and
fixed broadband Internet services to individuals and corporations.
Visit ASAHI Net's English-language Web page: http://asahi-net.jp/en/


--
Georgi

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links