Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]



That reminds me....

Stephen J. Turnbull writes:

 > although I've got a PR that's been stalled for two years,

It's been stalled because I have a module that WFM, and it's not clear
that it's generally needed.  This is a good place to ask about that!

I still occasionally get zip files with Shift JIS-encoded file names.
Of course this does not AT ALL go over well on Mac, and it would be
something of a PITA on Linux as well.  The patch adds a
`filenameencoding` argument to a couple of functions that open
zipfiles for reading, and a command line option `--filenameencoding`
to the __main__ program.  It only enables *reading* zip files with
Shift JIS names (internally converting to Unicode, of course, and
writing file names out in the system file name encoding).  It doesn't
implement *writing* them, because the Zipfile standard only allows
ISO-8859-1 (or maybe even ASCII, I'd have to check) and UTF-8.

I forget the exact syntax, but something like

python -m zipfile --extract --filenameencoding=shift_jis bogus.zip

works (assuming that the encoding is legit Shift JIS: there's no
provision for changing the error handler).

Would anybody else find this useful?  If so, I'll push the PR forward
(but it will take a while to propagate to the Python distribution,
since it would go in 3.8 due in 18 months).  I guess I could also put
something up on PyPI, but the problem there is that I'd either have to
completely duplicate the stdlib zipfile module, or monkeypatch it, and
I don't like either idea very much.

Steve



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links