Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] Re: font/char set question
- Date: Tue, 31 Jul 2007 08:44:21 +0900
- From: Darren Cook <darren@example.com>
- Subject: Re: [tlug] Re: font/char set question
- References: <5634e9210707282051g6d4ac8b9l1ba725231bdff464@mail.gmail.com> <d8fcc0800707290802x2c9798dj411fc5400e8b8d6f@mail.gmail.com> <46AD1954.1080209@dcook.org> <d8fcc0800707291946s531f3353y8e0124d8e12cb071@mail.gmail.com> <Pine.NEB.4.64.0707301307430.28098@homeric.cynic.net> <46ADC95A.2040200@dcook.org> <b4d277190707300445q508ad0cbkf6c687544eeec0d3@mail.gmail.com>
- User-agent: Thunderbird 1.5.0.10 (X11/20070301)
> Just wondering: Do you, or does anyone else, maintain a publicly available > list of wierd hyphens or other Unicode characters that don't strictly > speaking map neatly back to anything in Shi(f)t-JIS, but in practice can be > converted to something that does? (Encapsulated in a neat little class > representing legacy-compatible-UTF8 strings would be best...) By far the most common one is FULLWIDTH HYPHEN-MINUS (U+FF0D), which should be turned into katakana long vowel. Some others that might come up are SMALL HYPHEN-MINUS (U+FE63) and SMALL EM DASH (U+FE58) (convert both to ascii hyphen). Then PRESENTATION FORM FOR VERTICAL EN DASH (U+FE32) and PRESENTATION FORM FOR VERTICAL EM DASH(U+FE31) into ascii vertical bar. Sample php code to do all those conversions: $s=str_replace("-","ー",$s); $s=str_replace("﹣","-",$s); $s=str_replace("﹘","-",$s); $s=str_replace("︲","|",$s); $s=str_replace("︱","|",$s); And the round-trip conversion: $sjis=mb_convert_encoding($s,"SJIS","UTF-8"); $utf8=mb_convert_encoding($sjis,"UTF-8","SJIS"); if($s!==$utf8)complain_to_user(); Darren -- Darren Cook http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary) http://dcook.org/work/ (About me and my work) http://dcook.org/work/charts/ (My flash charting demos)
- Follow-Ups:
- Re: [tlug] Re: font/char set question
- From: Edmund Edgar
- References:
- [tlug] Re: font/char set question
- From: Jim Breen
- Re: [tlug] Re: font/char set question
- From: Josh Glover
- Re: [tlug] Re: font/char set question
- From: Darren Cook
- Re: [tlug] Re: font/char set question
- From: Josh Glover
- Re: [tlug] Re: font/char set question
- From: Curt Sampson
- Re: [tlug] Re: font/char set question
- From: Darren Cook
- Re: [tlug] Re: font/char set question
- From: Edmund Edgar
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] [OT] Good IT Resume
- Next by Date: Re: [tlug] Re: font/char set question
- Previous by thread: Re: [tlug] Re: font/char set question
- Next by thread: Re: [tlug] Re: font/char set question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links