
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
Stuart Luppescu writes:
> On Mon, 2021-03-08 at 16:58 +0900, Stephen J. Turnbull wrote:
> > In a fresh Emacs, try M-x setenv RET LC_CTYPE RET ja_JP.UTF-8 RET, and
> > M-: (setq default-process-coding-system 'utf-8) RET,
> >
> > then run R and try the program.
>
> Didn't change anything.
OK.
In the inferior R started by Emacs, what are the values of the
environment variables LC_ALL, LC_CTYPE, and LANG?
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Sys.getenv.html
> I don't know. My regular terminal (wterm) I know does very poorly with
> non-latin characters, so I installed rxvt-unicode (urxvt). It did not
> seem any different from wterm. :shrug:
Is wterm still maintained? The SourceForge page is dated 2013.
This:
> > Does
> > echo 平屋 どこかのマンション 湯河原マンション 熱海マンション
> > do the right thing?
>
> Nope. When I pasted that in, I get
> echo ?? ????????? ???????? ???????
> ?? env_comp~ env_comp ???????
strongly suggests that the *terms aren't finding the fonts they
expect. The question mark counts match the Japanese character counts,
so it appears it's being understood as UTF-8, but undisplayable.
This doesn't explain why things are weird in Emacs's inferior R
process, though.
> > the value of Emacs's default-process-coding-system
> this says utf-8
So Emacs *should* be sending UTF-8 to the inferior process, unless ESS
is setting the process-specific coding system differently (which seems
unlikely).
> I would send you the program but it's a dumb little thing,
I'm not particularly interested in the program, I'm interested in the
embedded strings. :-) If you cut and paste them, that process might
mess up the strings. But I'm pretty sure at this point that the
problem isn't the strings -- with the exception of the first cut and
paste, they all seem to be originally correctly encoded as UTF-8.
Even the question marks! So I think it's the locales that the various
programs are running in.
Stuart Luppescu writes in another post
<8ac8892a83ff83a14bed7a5ea8fef60a1c5dfb01.camel@example.com>:
> Then I tried running R in a new terminal, and copied and pasted
> from the program *displayed in another terminal*. This time I got
> this:
>
> > print(house.names)
> [1]
> "name" "å¹³å±\u008b" "ã\u0081©ã\u0081\u0093
These are representations of valid UTF-8, interpreted as Latin-1 (most
likely). They are almost certainly Japanese, checking the first one
gives "平屋" as expected.
> and the graph printed out with the labels in Japanese.
It appears to me that the program that Emacs is saving is properly
encoded in UTF-8, although it's very hard to be sure when data written
by emacs is being massaged by R, rxvt, and email in transmission.
> For some reason, emacs is messing with the encoding and the
> handling of the Japanese strings.
I don't think so. Some of the evidence is consistent with that, but
taken as a whole the evidence is pretty strong that Emacs is sending
the right, UTF-8-encoded text to files and to R, but that R and rxvt
are interpreting it incorrectly. In the case of R in an Emacs
inferior process, the environment is set by Emacs, so that could be a
problem with Emacs. I just don't think the problem is the text sent
by Emacs.
It's still *possible* that Emacs is sending the wrong thing to the R
in the inferior process, but I don't see why it would be doing that.
> Also, it doesn't seem to matter what system locale is being used. It
> seems to work as well (or as badly) if I set it to en_US.UTF-8 or to
> ja_JP.UTF-8.
en vs. ja shouldn't matter here. Only the encoding, UTF-8. That's
why programmers should love Unicode -- it should make text encoding
issues moot (and will, *some*day ;-). The other issues about
different languages are *much* harder to deal with. Just consider
Japanese "era" dates! :-)
Regards,
Steve
- References:
- [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
- From: Stephen J. Turnbull
- Re: [tlug] Emacs IME, locale, encodings, R, aarrrrgggghhhh!!!!
Home |
Main Index |
Thread Index