
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] font encoding question
- Date: Sat, 16 Jun 2007 08:30:58 -0700
- From: steven smith <sjs@example.com>
- Subject: [tlug] font encoding question
- User-agent: Thunderbird 2.0.0.0 (Windows/20070326)
Hi all
I'm about to do my first CGI program as a result of an
earlier discussion. In that I asked about onyomi/kunyomi
and whether I should attempt to memorize them. The answer
was yes and after paying more attention to the kanji and how
they were pronounced in words, I understand why. But how
to memorize at least a couple of hundred
kanji/onyomi/kunyomi... that's a problem.
I'm using a nice little opensource memorization program
called Mnemosyne that allows import of it's "flash cards" in
various formats including XML. What I am want to do is
generate the XML file using a CGI with various radio-buttons
to determine if output is to contain, and a text window
where the user input their kanji and info from KANJD212 to
decipher the kanji.
I was assuming I'd just split the input from the text window
and do a simple table lookup. All of this is to be done in
perl and I've done all of it except the lookup before as
various little stand-alone utilities. I haven't done a lot
of CGI recently and also done little with UTF-8, but it
doesn't sound too difficult. I expected to do a simple
compare on the input character values and throw out any
thing that didn't look like kanji.
Then a note went by between Josh Glover and Jim Breen about
problems Josh was having. It turned out that part of the
problem is font encoding, and I hadn't even considered font
encoding. I just assumed that the user's input would be
UTF-8 like my script.
So here is the questions
1) How do I handle the user input. I plan on storing
KANJD212 in a hash with the kanji as keys. Can I just split
the table input and throw out anything not in the KANJD212 hash?
2) how do I handle errors.
What I'm leaning toward is just saying "input must be utf-8"
and praying that it is. Doing a split on the input to pull
out the individual characters and throwing out white space.
I'd then look through the result and compare these against
the KANJD212 input (stored as a hash) and warn the user that
characters didn't convert if there are problems.
Does this sound like a good approach, and is it sufficient?
I did few google searches on "font encoding" and determine,
but nothing interesting turned up.
My background is that I have several years of writing perl
and feel confident of my abilities, but haven't done much
CGI and almost nothing using non-ASCII. This is new stuff
for me. And to be honest, right now, my main push is to
learn reading/writing/speaking enough Japanese to build a
foundation to learn on. I'd like to come over there
(California -> Japan) to work for a couple of years before
retiring -- whatever that means. This is a utility I
thought the community might find useful.
Thanks
Steve S.
Home |
Main Index |
Thread Index