TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

Date: Fri, 29 Jun 2018 15:25:38 +0900

From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>

Subject: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

References: <23345.41167.951877.900876@turnbull.sk.tsukuba.ac.jp> <23345.44414.330392.350450@turnbull.sk.tsukuba.ac.jp> <CAKXLc7c-LzgY5AtE8XrZzKUrr206nXtmxdtKQC0q8PkcMjiF7A@mail.gmail.com> <CABHGxq6CkEeQVHy7rjjbTP72mOm_QQ5xtthPuPR-QASYQoS_ag@mail.gmail.com> <07A05935-BBD8-4C13-AEF6-667D653EBE45@brightblack.net> <23346.65438.401753.15741@turnbull.sk.tsukuba.ac.jp> <CABHGxq5mnJgiSxGKEXZ4KYBAuVB0YBUwMqi+duoTk1iSeXj9PQ@mail.gmail.com> <23348.26994.89985.547640@turnbull.sk.tsukuba.ac.jp> <CABHGxq6uUOvhMGi50tev5ckXTo=R+bxykDLOOCeCiiAUVHF=wQ@mail.gmail.com>
Jim Breen writes:

 > I don't think [fixed-width 3-octet] would be awkward at all. Much
 > of my recent text-processing work has used UTF-8 throughout and
 > it's not been a problem.

OK.  A lot of the issues with Emacs and odd octet widths come from
generic memory management where many systems really like power-of-2
alignment, and certain kinds of string matching, which it turns out
can be greatly speeded up if you do them 32 or 64 bits at a time :-).

 > > Python 3 moved to a content-dependent fixed-width type.  If your
 > > string is all ISO-8859-1, it's encoded as an array of octets.  If
 > > it contains even one astral character, it's UTF-32.  everything
 > > else is UCS-2 (aka the subset of UTF-16 excluding surrogates).
 > 
 > That approach sort-of makes sense, but I'd hate to be maintaining
 > it.

A plausible take, but that kind of code has been very stable in my
experience.  Once you have the (simple) array of characters accesses
and mutations code correct, and the (also simple) widening and
narrowing code correct, optimizations tend to be very local and easy
to do correctly.  Of course you have to do things through the API
which slightly limits how efficiently you can access and mutate the
underlying storage, but it's still wicked fast compared to Emacs. ;-)

 > Anyway there'll be no "successor maintainers" for wwwjdic. I'll instruct
 > my executors to put it on the bonfire, along with my used toothbrushes
 > and underpants.

As Kori Schake[1] likes to say, "Jim, I did not need that visual!"

Footnotes: 
[1]  https://twitter.com/deepstateradio
References:

[tlug] Kudos to Jim Breen
From: Stephen J. Turnbull

[tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Stephen J. Turnbull

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Kalin KOZHUHAROV

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Jim Breen

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: grb

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Stephen J. Turnbull

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Jim Breen

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Stephen J. Turnbull

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]
From: Jim Breen

Prev by Date: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

Next by Date: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

Previous by thread: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

Next by thread: Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links