
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Re: font/char set question
> Just wondering: Do you, or does anyone else, maintain a publicly available
> list of wierd hyphens or other Unicode characters that don't strictly
> speaking map neatly back to anything in Shi(f)t-JIS, but in practice can be
> converted to something that does? (Encapsulated in a neat little class
> representing legacy-compatible-UTF8 strings would be best...)
By far the most common one is FULLWIDTH HYPHEN-MINUS (U+FF0D), which
should be turned into katakana long vowel.
Some others that might come up are SMALL HYPHEN-MINUS (U+FE63) and SMALL
EM DASH (U+FE58) (convert both to ascii hyphen). Then PRESENTATION FORM
FOR VERTICAL EN DASH (U+FE32) and PRESENTATION FORM FOR VERTICAL EM
DASH(U+FE31) into ascii vertical bar.
Sample php code to do all those conversions:
$s=str_replace("-","ー",$s);
$s=str_replace("﹣","-",$s);
$s=str_replace("﹘","-",$s);
$s=str_replace("︲","|",$s);
$s=str_replace("︱","|",$s);
And the round-trip conversion:
$sjis=mb_convert_encoding($s,"SJIS","UTF-8");
$utf8=mb_convert_encoding($sjis,"UTF-8","SJIS");
if($s!==$utf8)complain_to_user();
Darren
--
Darren Cook
http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary)
http://dcook.org/work/ (About me and my work)
http://dcook.org/work/charts/ (My flash charting demos)
Home |
Main Index |
Thread Index