Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] font encoding question
- Date: Sat, 16 Jun 2007 08:30:58 -0700
- From: steven smith <sjs@example.com>
- Subject: [tlug] font encoding question
- User-agent: Thunderbird 2.0.0.0 (Windows/20070326)
Hi all I'm about to do my first CGI program as a result of an earlier discussion. In that I asked about onyomi/kunyomi and whether I should attempt to memorize them. The answer was yes and after paying more attention to the kanji and how they were pronounced in words, I understand why. But how to memorize at least a couple of hundred kanji/onyomi/kunyomi... that's a problem. I'm using a nice little opensource memorization program called Mnemosyne that allows import of it's "flash cards" in various formats including XML. What I am want to do is generate the XML file using a CGI with various radio-buttons to determine if output is to contain, and a text window where the user input their kanji and info from KANJD212 to decipher the kanji. I was assuming I'd just split the input from the text window and do a simple table lookup. All of this is to be done in perl and I've done all of it except the lookup before as various little stand-alone utilities. I haven't done a lot of CGI recently and also done little with UTF-8, but it doesn't sound too difficult. I expected to do a simple compare on the input character values and throw out any thing that didn't look like kanji. Then a note went by between Josh Glover and Jim Breen about problems Josh was having. It turned out that part of the problem is font encoding, and I hadn't even considered font encoding. I just assumed that the user's input would be UTF-8 like my script. So here is the questions 1) How do I handle the user input. I plan on storing KANJD212 in a hash with the kanji as keys. Can I just split the table input and throw out anything not in the KANJD212 hash? 2) how do I handle errors. What I'm leaning toward is just saying "input must be utf-8" and praying that it is. Doing a split on the input to pull out the individual characters and throwing out white space. I'd then look through the result and compare these against the KANJD212 input (stored as a hash) and warn the user that characters didn't convert if there are problems. Does this sound like a good approach, and is it sufficient? I did few google searches on "font encoding" and determine, but nothing interesting turned up. My background is that I have several years of writing perl and feel confident of my abilities, but haven't done much CGI and almost nothing using non-ASCII. This is new stuff for me. And to be honest, right now, my main push is to learn reading/writing/speaking enough Japanese to build a foundation to learn on. I'd like to come over there (California -> Japan) to work for a couple of years before retiring -- whatever that means. This is a utility I thought the community might find useful. Thanks Steve S.
- Follow-Ups:
- Re: [tlug] font encoding question
- From: Edward Wright
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Giving a program priority briefly
- Next by Date: Re: [tlug] Re: WWWJDIC backdoor issue
- Previous by thread: Re: [tlug] Re: WWWJDIC backdoor issue
- Next by thread: Re: [tlug] font encoding question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links