Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

tlug: What decides Japanese file name encoding?



>>>>> "Jim" == Jim Blackson <blackson@example.com> writes:

    Jim> I installed UDF-0.8.5.2 on TurboLinux 4.0 (1999.08 Nikkei
    Jim> Linux) to read a CD-RW disk.  The UDF driver supplies
    Jim> filenames encoded in UTF-8.  But the Japanese file names are
    Jim> listed by ls "as is" in UTF-8.

    Jim> [URL: Linux UDF -
    Jim> http://www.trylinux.com/projects/udf/index.html Note:
    Jim> UDF-0.8.5.2 needs a patch to get the Japanese UTF-8 encoding
    Jim> right.]

    Jim> It looks like my locale is set as "LANG=ja_JP.ujis"

You could try setting locale to ja_JP.utf8.  But the kernel doesn't
know anything about Japanese yet, so there won't be a NLS module, so
AFAIK it won't do anything.

IMHO, from the OS's and most apps point of view, constant text should
not need to be interpreted.  (Obviously, editors need not apply here.)
If you make it the app's job, then in a pipeline like `ls -l | grep
'[-r][-w]s' | sort', which app is supposed to do the translation?  How 
is it supposed to know if the translation has already been done?
Suppose you want to use a non-JIS collation for sort and grep, but ls
does the translation (see my earlier post re Unicode implementation)?

If you're spitting it into a file or socket, same considerations apply 
as for a pipe.

So only widgets that speak directly to users, ie, terminal emulators
(whether standalone like kterm or part of a larger app like XEmacs
frames) should normally need to worry about it.  (Character code
mungers like multilingual editors and translators, sorters and
searchers, again obviously excepted.)

Soooo....  Try 9term.  Or pipe it into Yudit.  Or write a UTF to EUC
filter (I think one of the many Japanese code filters does this
already, YMMV.)  Pretty soon there will be a versions of XEmacs and
FSFmacs which understand UTF-8 (and use national standard fonts like
jiskan24 to display it).  There is a UTF-8 font for XTerm, I'll dig up
the URL if you want.  (I'm not gonna bother now because of the 4000 or
so glyphs it handles, none of them are Japanese, and Unihan support is
not even projected yet.)

Wise ass commentary follows.

WTF is UDF?  Unordered disk fragments?

    Jim> So, who/what is responsible for seeing that any conversions
    Jim> get done?

"I don't know, but I can tell you it isn't me."

    Jim> OR

    Jim> How does this locale stuff really work?

Really work?  Really badly, at least for Japanese.  That's required by
JIS standard I believe.

    Jim> If this is FAQ, please point me in the right direction :-)

FAQ 0.  It doesn't work.
ANS 0.  Use the source, Luke.

Unfortunately, AFAIK there isn't a better answer at the moment.  :-/

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for?  "Free software rules."
-------------------------------------------------------------------
Next Technical Meeting: August 14 (Sat), 13:00  place: Temple Univ.
*** Special guest: Marc Christensen (Salt Lake Linux Users Group)
Next Nomikai: September 20 (Fri), 19:30 Tengu TokyoEkiMae 03-3275-3691
-------------------------------------------------------------------
more info: http://www.tlug.gr.jp        Sponsor: Global Online Japan


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links