Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Date: Mon, 15 Mar 2021 02:00:39 +0900
- From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>
- Subject: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- References: <24647.6044.643934.541170@turnbull.sk.tsukuba.ac.jp> <00fcee6526a6c8c360532a56c23ae57adc25fc5c.camel@uchicago.edu> <20210314090435.p4icrmm6jnijxopk@iambic.cynic.net>
Curt J. Sampson writes: > I've never quite understood the appeal of Bash in an Emacs window > in a tmux window in an X11 window. :-P) I don't either: $ ls -l /usr/bin/xemacs lrwxr-xr-x 1 steve staff 6 Mar 15 01:34 /usr/bin/xemacs -> /sbin/init ;-) > `xxd` or another hexdump program may be handy [to check UTF-8-ness]. Sure, but I already know whether what's in Stuart's email is UTF-8 or not from looking at the Latin-1. I suppose it might be easier for *Stuart* to learn the basics[1] using hex rather than Latin-1 (where "you're lost in a string of twisty accented vowels all alike"), but it's even easier to just ask someone who's lived that nightmare. What I don't know is where in the pipeline of buffers between his Emacs and my XEmacs things are getting munged. I suggested Stuart's program rather than "echo 'これは日本語です。'" because we already know that that has various results for different output media, and doesn't require him to do things that he might interpret differently from what I think I'm asking him to do. > > Only the encoding, UTF-8. That's why programmers should love > > Unicode -- it should make text encoding issues moot (and will, > > *some*day ;-). > Well, yes, it does once you've dealt with UTF-8 vs. UTF-16 vs. unencoded > UCS-2, big- vs. little-endian UTF-16/UCS-2, the presence or not of byte > order markers.... These are all basically trivial to autodetect on the assumption that it should be human-readable text, though -- you don't even really need to know which language. (Big- vs. little-endian UTF-16 requires statistical analysis, but rarely very much data.) But I doubt even Google does a good job on distinguishing ISO-8859-1 vs. ISO-8859-15, ISO-8859-2 vs. ISO-8859-16, or among Japanese corporate versions of JIS (whether encoded as ISO-2022-JP, Shift JIS, or EUC-JP). Those issues however are moot with Unicode. In actual practice, I'm pretty sure that octets are not going away any time soon, so UTF-8 will (eventually) be universally used for all exchange of text in IPC: there's no good reason to encode a substream of text in anything else.[2] The widechar versions are going to be irrelevant unless you're implementing a programming language designed for very precise and efficient implementations of text processing. Anything that non-systems-programmers will get on their terminals via stdout will be UTF-8. Footnotes: [1] I'm assuming he doesn't already know how UTF-8 is encoded, and that might be some what rude (in which case I apologize), but I'm pretty sure if he did he would have commented on the output he posted. [2] Except maybe on a single Windows host or in Java core dumps, but that's Microsoft's or Oracle's problem, not mine.
- References:
- Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stephen J. Turnbull
- Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stuart Luppescu
- Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Curt J. Sampson
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Next by Date: Re: [tlug] March 13th, 2021 Technical Meeting
- Previous by thread: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Next by thread: [tlug] Job: system admin (+SOC2)
- Index(es):