Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] [OT] C/C++: Need alternative



David Oftedal <david@example.com> wrote:
> 
> I'm trying to parse this korean dictionary called engdic into the EDICT 
> format (Discarding the metadata for now) in order to be able to use it 
> with gjiten2. 

Have you got the latest version? Francis Bond at NTT and Kyonghee Paek
have edited the old engdic, removing many errors, etc.

>>However, I fell into the trap of trying to use strtok or 
> strsep in order to parse the lines into individual fields, instead of 
> writing my own functions. I've got them to stop segfaulting, but they 
> still don't work. So I'm giving up on those.

I've never had trouble with strtok etc. with Japanese/Korean text. You
just need to be aware that they are strictly single-byte. Make sure
everything is unsigned char, etc.

Jim


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links