Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Date: Tue, 9 Mar 2021 15:37:16 +0900
- From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>
- Subject: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- References: <d6b21c8964fcf607f447aa99898ca59fc19c8ae3.camel@uchicago.edu> <24645.55562.460201.880422@turnbull.sk.tsukuba.ac.jp> <00fcee6526a6c8c360532a56c23ae57adc25fc5c.camel@uchicago.edu>
Stuart Luppescu writes: > On Mon, 2021-03-08 at 16:58 +0900, Stephen J. Turnbull wrote: > > In a fresh Emacs, try M-x setenv RET LC_CTYPE RET ja_JP.UTF-8 RET, and > > M-: (setq default-process-coding-system 'utf-8) RET, > > > > then run R and try the program. > > Didn't change anything. OK. In the inferior R started by Emacs, what are the values of the environment variables LC_ALL, LC_CTYPE, and LANG? https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.getenv.html > I don't know. My regular terminal (wterm) I know does very poorly with > non-latin characters, so I installed rxvt-unicode (urxvt). It did not > seem any different from wterm. :shrug: Is wterm still maintained? The SourceForge page is dated 2013. This: > > Does > > echo 平屋 どこかのマンション 湯河原マンション 熱海マンション > > do the right thing? > > Nope. When I pasted that in, I get > echo ?? ????????? ???????? ??????? > ?? env_comp~ env_comp ??????? strongly suggests that the *terms aren't finding the fonts they expect. The question mark counts match the Japanese character counts, so it appears it's being understood as UTF-8, but undisplayable. This doesn't explain why things are weird in Emacs's inferior R process, though. > > the value of Emacs's default-process-coding-system > this says utf-8 So Emacs *should* be sending UTF-8 to the inferior process, unless ESS is setting the process-specific coding system differently (which seems unlikely). > I would send you the program but it's a dumb little thing, I'm not particularly interested in the program, I'm interested in the embedded strings. :-) If you cut and paste them, that process might mess up the strings. But I'm pretty sure at this point that the problem isn't the strings -- with the exception of the first cut and paste, they all seem to be originally correctly encoded as UTF-8. Even the question marks! So I think it's the locales that the various programs are running in. Stuart Luppescu writes in another post <8ac8892a83ff83a14bed7a5ea8fef60a1c5dfb01.camel@example.com>: > Then I tried running R in a new terminal, and copied and pasted > from the program *displayed in another terminal*. This time I got > this: > > > print(house.names) > [1] > "name" "å¹³å±\u008b" "ã\u0081©ã\u0081\u0093 These are representations of valid UTF-8, interpreted as Latin-1 (most likely). They are almost certainly Japanese, checking the first one gives "平屋" as expected. > and the graph printed out with the labels in Japanese. It appears to me that the program that Emacs is saving is properly encoded in UTF-8, although it's very hard to be sure when data written by emacs is being massaged by R, rxvt, and email in transmission. > For some reason, emacs is messing with the encoding and the > handling of the Japanese strings. I don't think so. Some of the evidence is consistent with that, but taken as a whole the evidence is pretty strong that Emacs is sending the right, UTF-8-encoded text to files and to R, but that R and rxvt are interpreting it incorrectly. In the case of R in an Emacs inferior process, the environment is set by Emacs, so that could be a problem with Emacs. I just don't think the problem is the text sent by Emacs. It's still *possible* that Emacs is sending the wrong thing to the R in the inferior process, but I don't see why it would be doing that. > Also, it doesn't seem to matter what system locale is being used. It > seems to work as well (or as badly) if I set it to en_US.UTF-8 or to > ja_JP.UTF-8. en vs. ja shouldn't matter here. Only the encoding, UTF-8. That's why programmers should love Unicode -- it should make text encoding issues moot (and will, *some*day ;-). The other issues about different languages are *much* harder to deal with. Just consider Japanese "era" dates! :-) Regards, Steve
- References:
- [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stuart Luppescu
- [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stephen J. Turnbull
- Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stuart Luppescu
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Running from USB memory stick (hardware issues)
- Next by Date: [tlug] Job: system admin (+SOC2)
- Previous by thread: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Next by thread: Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- Index(es):