
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
That reminds me....
Stephen J. Turnbull writes:
> although I've got a PR that's been stalled for two years,
It's been stalled because I have a module that WFM, and it's not clear
that it's generally needed. This is a good place to ask about that!
I still occasionally get zip files with Shift JIS-encoded file names.
Of course this does not AT ALL go over well on Mac, and it would be
something of a PITA on Linux as well. The patch adds a
`filenameencoding` argument to a couple of functions that open
zipfiles for reading, and a command line option `--filenameencoding`
to the __main__ program. It only enables *reading* zip files with
Shift JIS names (internally converting to Unicode, of course, and
writing file names out in the system file name encoding). It doesn't
implement *writing* them, because the Zipfile standard only allows
ISO-8859-1 (or maybe even ASCII, I'd have to check) and UTF-8.
I forget the exact syntax, but something like
python -m zipfile --extract --filenameencoding=shift_jis bogus.zip
works (assuming that the encoding is legit Shift JIS: there's no
provision for changing the error handler).
Would anybody else find this useful? If so, I'll push the PR forward
(but it will take a while to propagate to the Python distribution,
since it would go in 3.8 due in 18 months). I guess I could also put
something up on PyPI, but the problem there is that I'd either have to
completely duplicate the stdlib zipfile module, or monkeypatch it, and
I don't like either idea very much.
Steve
Home |
Main Index |
Thread Index