TLUG Mailing List

Mailing List Archive

tlug.jp Mailing List tlug archive tlug Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Stymied by emacs character display

Date: Wed, 20 Feb 2019 12:09:49 +0900

From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>

Subject: Re: [tlug] Stymied by emacs character display

References: <3457c09e095bd3c45066eddd95ba9fdc77f78039.camel@uchicago.edu> <23659.48778.935713.709773@turnbull.sk.tsukuba.ac.jp> <4e46af96185ebe3b712e50a3e8b50b4501e46163.camel@uchicago.edu>
Stuart Luppescu writes:

 > On Tue, 2019-02-19 at 17:30 +0900, Stephen J. Turnbull wrote:
 > > Stuart Luppescu writes:
 > > 
 > >  > I'm generating a report from an rmarkdown document
 > > 
 > > What language(s) are contained in this document?  What is it encoded
 > > in (UTF-8 I'm pretty sure, but confirmation would be nice)?
 > 
 > This is kind of complicated. The original R program (which contains the
 > strings copied and pasted from MS Word), is ascii, which writes the
 > control file for the analysis software and is utf-8. The analysis
 > program runs in Windows and produces an output file which is windows-
 > 1252. Then a python script extracts some tables (utf-8) for inclusion
 > in the rmarkdown report, which is ascii.

I was interested in the *natural* languages (eg, one possible
interpretation of the string in the error message was U+8080, which is
a more or less recent addition that appears to either be a Chinese (?)
radical not in the Japanese repertoire of radicals, or perhaps a "part
kanji" -- I'd like to be able to rule that out).  I guess since
everything's ASCII or "extended ASCII", that's English.

 > >  > When I try to generate the report I get error messages like this:
 > >  > ! Package inputenc Error: Unicode char \u8:\200%Gâ¬%@ not set up for use

Ouch.  There's that EURO SIGN again.  I'm going to have to look at
what XEmacs is doing here more carefully.

 > included in teachersâ<U+0080><U+0099> evaluations.
 > 
 > The whole string from the a-circumflex to <U+0099> was originally a
 > single apostrophe.

Yeah, that decodes to U+2019 "close single quote" (which isn't
actually an apostrophe, but who cares, right?)

 > And the weird thing is that it was just an apostrophe in the output
 > file,

If the rest of the text was ASCII and Python is defaulting to ASCII
output, that's to be expected from introducing directed quotation
marks.  Python is very strict about the output encoding, unless the
programmer specifies otherwise.  I'm not sure about the exact
mechanism (Python normally should raise an error or output an escape
sequence, so this appears to be part of the script).

Steve
References:

[tlug] Stymied by emacs character display
From: Stuart Luppescu

[tlug] Stymied by emacs character display
From: Stephen J. Turnbull

Re: [tlug] Stymied by emacs character display
From: Stuart Luppescu

Prev by Date: Re: [tlug] Stymied by emacs character display

Next by Date: Re: [tlug] Stymied by emacs character display

Previous by thread: Re: [tlug] Stymied by emacs character display

Next by thread: [tlug] GitHub Private Repos Are Now Free

Index(es):

Date

Thread

Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links