Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Bogus Japanese zipfiles [was: Kudos to Jim Breen]



On 27 June 2018 at 13:08, Stephen J. Turnbull
<turnbull.stephen.fw@example.com> wrote:
> .......  and there are
> just too many moving parts if you decide to change encodings.  "Don't
> fix it if the users have workarounds!" ¯\(ツ)/¯

Amen, brother, amen.

I have this issue with the wwwjdic spaghetti code (mea culpa).  Parts of it
date from before Unicode and UTF-8 existed, and all the internal data
structures are built around the assumption that kana and kanji take two bytes.
I'd love to move it over to using UTF-8 internally, but I'm not inspired to do
it because:
(a) it'll be a hellava lot of work. One big issue is that in EUC the difference
between equivalent katakana and hiragana characters is a single bit, so
relaxed kana matching is trivial. In Unicode they are in the same plane
with an irregular offset, so it all gets much harder.
(b) very few users would know the difference. The interface defaults to
UTF-8 these days and everything goes in and out via iconv().
(c) as I progress through my 8th decade I find I have more interesting (and
urgent) things on my bucket list.

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links