Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Date: Wed, 19 Jul 2006 20:26:59 +1000 (EST)
- From: Jim Breen <Jim.Breen@example.com>
- Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes
[Dave M G (Re: [tlug] [OT] Strip Kanji from a document for study purposes) writes:] >> >>> (This message includes utf8 encoded Japanese text) >> > JB > Which arrived in the digest version as ?????????? >> > >> Sorry, did I not send it correctly or something? Dunno why that happened, but it was upstream of me. I tried reading it using several clients, and eventually just opened the mail box with an editor. All that line contained was ASCII question-marks. >> However, is it an imposition to make a suggestion to you as the >> developer since I don't have the skills to really do anything myself >> about it: >> >> Might there be a way to show the output without the original text >> included? And also not have each sentence parsed individually? There is. I could put in (yet-another) option to suppress that. In the 8 years that that function has been operating, no-one else has asked to have the text removed, so I suspect the demand is not high 8-)} What most people do when creating vocab lists from text is to do a Save-as as a text file, then edit out the bits they don't want. >> The way it comes out now, each sentence is displayed before the list of >> words contained in that sentence. One of the issues is that if a word >> appears over and over again in different sentences, then it will show up >> again and again in each list of words. There is an option to stop a word/phrase being displayed more than once. It is the checkbox labelled "no repeated translations". >> For what I'm going for, it would be ideal to take the whole text and >> create a list in such a way so that there would be no duplicates. >> >> I'd also favour not showing the original sentences along with the word >> lists, as what I'm trying to do is separate out grammar and vocabulary >> study. See how you go saving to a text file and hitting it with an editor. That's what I have done. I find I have to trim out the words I know to get it down to a usable study list. Jim Tittsler <jwt-tlug@example.com> added: >> If your mail client will deal with MIME encoded digests, you might >> try visiting >> http://www.tlug.jp/mailman/options/tlug >> and selecting MIME format digests about 2/3 of the way down the >> rather option filled page. Thanks. I've done that now and I'll see how it goes. I've not had trouble in the past with Japanese text in tlug digests, but it's usually been ISO-2022-JP. The (text) Honyaku list, out of Yahoo Groups, usually gets the coding right, by converting each message to a common encapsulation. Cheers Jim -- Jim Breen http://www.csse.monash.edu.au/~jwb/ Clayton School of Information Technology, Tel: +61 3 9905 9554 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大蛙触Â
- Follow-Ups:
Home | Main Index | Thread Index
- Prev by Date: [tlug] Digest mojibake [was: Strip Kanji from a document for study purposes]
- Next by Date: Re: [tlug] Scribus needs help with CJK
- Previous by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Next by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links