Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Stymied by emacs character display
- Date: Wed, 20 Feb 2019 12:09:49 +0900
- From: "Stephen J. Turnbull" <turnbull.stephen.fw@example.com>
- Subject: Re: [tlug] Stymied by emacs character display
- References: <3457c09e095bd3c45066eddd95ba9fdc77f78039.camel@uchicago.edu> <23659.48778.935713.709773@turnbull.sk.tsukuba.ac.jp> <4e46af96185ebe3b712e50a3e8b50b4501e46163.camel@uchicago.edu>
Stuart Luppescu writes: > On Tue, 2019-02-19 at 17:30 +0900, Stephen J. Turnbull wrote: > > Stuart Luppescu writes: > > > > > I'm generating a report from an rmarkdown document > > > > What language(s) are contained in this document? What is it encoded > > in (UTF-8 I'm pretty sure, but confirmation would be nice)? > > This is kind of complicated. The original R program (which contains the > strings copied and pasted from MS Word), is ascii, which writes the > control file for the analysis software and is utf-8. The analysis > program runs in Windows and produces an output file which is windows- > 1252. Then a python script extracts some tables (utf-8) for inclusion > in the rmarkdown report, which is ascii. I was interested in the *natural* languages (eg, one possible interpretation of the string in the error message was U+8080, which is a more or less recent addition that appears to either be a Chinese (?) radical not in the Japanese repertoire of radicals, or perhaps a "part kanji" -- I'd like to be able to rule that out). I guess since everything's ASCII or "extended ASCII", that's English. > > > When I try to generate the report I get error messages like this: > > > ! Package inputenc Error: Unicode char \u8:\200%Gâ¬%@ not set up for use Ouch. There's that EURO SIGN again. I'm going to have to look at what XEmacs is doing here more carefully. > included in teachersâ<U+0080><U+0099> evaluations. > > The whole string from the a-circumflex to <U+0099> was originally a > single apostrophe. Yeah, that decodes to U+2019 "close single quote" (which isn't actually an apostrophe, but who cares, right?) > And the weird thing is that it was just an apostrophe in the output > file, If the rest of the text was ASCII and Python is defaulting to ASCII output, that's to be expected from introducing directed quotation marks. Python is very strict about the output encoding, unless the programmer specifies otherwise. I'm not sure about the exact mechanism (Python normally should raise an error or output an escape sequence, so this appears to be part of the script). Steve
- References:
- [tlug] Stymied by emacs character display
- From: Stuart Luppescu
- [tlug] Stymied by emacs character display
- From: Stephen J. Turnbull
- Re: [tlug] Stymied by emacs character display
- From: Stuart Luppescu
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Stymied by emacs character display
- Next by Date: Re: [tlug] Stymied by emacs character display
- Previous by thread: Re: [tlug] Stymied by emacs character display
- Next by thread: [tlug] GitHub Private Repos Are Now Free
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links