Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] HTML Best Text Bridge?



>>>>> "Lyle" == Lyle Saxon <Lyle> writes:

    Lyle> Generally if someone sends me text, it's just embedded in
    Lyle> the e-mail, but I was sent an MS-Word file in Japanese last
    Lyle> week.

There's no such thing as a "Word file in Japanese" in the sense that
MS Word files use UTF-16 (little-endian) internally.  (MS Word _does_
have Japanese-specific features, but AFAIK that encoding is used for
all languages, even English.)

    Lyle> I was then surprised to see that it doesn't open in
    Lyle> OpenOffice, SciTE, or EditPad Pro (or more precisely, it
    Lyle> opens, but is bakemoji).

Try opening it with XEmacs (or GNU Emacs).  If it's actually not
text/plain, but text/rich-text (which is what the Mac's TextEdit
produces, aaarrrrgh!), it probably won't work.  But if it's true
text/plain, you should win immediately.

The next-best thing to Emacsen is Mozilla, see below.  And Mozilla is
a much more familiar UI.

(If I understood Kat Momoi's presentation a while back correctly, the
Mozilla project has an encoding detection subsystem that is actually
much better than Emacs's, both by design and in implementation.  But
for reasons I don't know they don't use it yet.)

    Lyle> 1) I opened it with an old (off-line) W-box, 2) Copy-pasted
    Lyle> it over to Netscape Composer, 3) Saved it as an HTML file,

As far as I know these three steps are redundant.

    Lyle> 4) Put the floppy in my Linux computer (SuSE 9.3),
    Lyle> transfered the file and opened it with FireFox, 5) Seeing
    Lyle> that it was Shift_JIS, I opened a blank Mozilla Composer
    Lyle> page and copy pasted it there and then saved it as UTF-8.

    Lyle> Or what should I be using to open a plain text file made
    Lyle> with W-XP?

Mozilla.  If Firefox won't let you open a .txt file from the Open File
dialog, try using the URL file://localhost/floppy/original_file.txt in
the navigation bar.  If the autodetection doesn't work and you get
mojibake, use the View Menu (IIRC) to change the encoding manually to
each of the Japanese encodings until one works.

To coin a proverb, "text/plain is legal HTML".  All the tags are
minimized, that's all. :-)  It is not legal XHTML, of course.  And I'm
not sure if it's quite technically true that text/plain is legal HTML,
but it's pretty close and with the wide variety of broken HTML out
there, browsers have to support it.


-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links