Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Freely distributable Japanese capable utf8 font?



On 2008-03-12 10:58 +0900 (Wed), emiddleton@example.com wrote:

> [explanation of encodings and fonts]

Let's not forget character sets. To summarize:

A character set gives a number to a specifc glyph: e.g., Unicode says
that the number assigned to "A" is 0x41, and the number assigned to 国
(kuni/country) is 0x56FD.

An encoding is a particular way of expressing this number as a string
of bits for the purposes of communicating it to someone else, such as
via a text file, an e-mail message, an HTML page, or whatever. Encodings
differ in the amount of wasted space for different kinds of text, ease
of decoding, etc. So the UCS-16BE encoding of Unicode A and kuni would
be (as strings of octets expressed in hexadecimal) 00-41 and 56-FD,
respectively. The UTF-8 expressions of Unicode A and kuni would be 41
and E5-9B-BD. Note that in both cases, above, the bit strings 56-FD and
E5-9B-BD are expressing the number 0x56FD, just in different ways.

JIS, of course, assigns an different number to kuni: 0x3971. This number
is expressed as 8D-91 in Shift_JIS encoding, B9-F1 in EUC-JP encoding.
In ISO-2022-JP encoding, it's 39-71, but only in "Japanese mode" (I
forget the real name for this), so we have to get into that mode with
1B-24-42 and leave it back for "ASCII mode" with 1B-28-42, giving us
1B-24-42-39-71-1B-28-42 for just that one character alone on a line.

And then, yes, the fonts are the particular patterns of black and white
dots and lines you draw to let a human see the actual glyph.

cjs
-- 
Curt Sampson       <cjs@example.com>        +81 90 7737 2974   
Mobile sites and software consulting: http://www.starling-software.com


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links