Re: [tlug] [OT] Strip Kanji from a document for study purposes

Date: Wed, 19 Jul 2006 20:26:59 +1000 (EST)
From: Jim Breen <Jim.Breen@example.com>
Subject: Re: [tlug] [OT] Strip Kanji from a document for study purposes

[Dave M G (Re: [tlug] [OT] Strip Kanji from a document for study purposes) writes:]
>> >>> (This message includes utf8 encoded Japanese text)
>> >
JB > Which arrived in the digest version as ??????????
>> >   
>> Sorry, did I not send it correctly or something?

Dunno why that happened, but it was upstream of me. I tried reading it
using several clients, and eventually just opened the mail box with
an editor. All that line contained was ASCII question-marks.

>> However, is it an imposition to make a suggestion to you as the
>> developer since I don't have the skills to really do anything myself
>> about it:
>> 
>> Might there be a way to show the output without the original text
>> included? And also not have each sentence parsed individually?

There is. I could put in (yet-another) option to suppress that. In the
8 years that that function has been operating, no-one else has asked to have
the text removed, so I suspect the demand is not high   8-)}
What most people do when creating vocab lists from text is to do a
Save-as as a text file, then edit out the bits they don't want.

>> The way it comes out now, each sentence is displayed before the list of
>> words contained in that sentence. One of the issues is that if a word
>> appears over and over again in different sentences, then it will show up
>> again and again in each list of words.

There is an option to stop a word/phrase being displayed more than once.
It is the checkbox labelled "no repeated translations".

>> For what I'm going for, it would be ideal to take the whole text and
>> create a list in such a way so that there would be no duplicates.
>> 
>> I'd also favour not showing the original sentences along with the word
>> lists, as what I'm trying to do is separate out grammar and vocabulary
>> study.

See how you go saving to a text file and hitting it with an editor.
That's what I have done. I find I have to trim out the words I know
to get it down to a usable study list.

Jim Tittsler <jwt-tlug@example.com> added:

>> If your mail client will deal with MIME encoded digests, you might
>> try visiting
   >> http://www.tlug.jp/mailman/options/tlug
>> and selecting MIME format digests about 2/3 of the way down the
>> rather option filled page.

Thanks. I've done that now and I'll see how it goes. I've not had
trouble in the past with Japanese text in tlug digests, but it's
usually been ISO-2022-JP.

The (text) Honyaku list, out of Yahoo Groups, usually gets the coding 
right, by converting each message to a common encapsulation.

Cheers

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大蛙触Ā

Follow-Ups:
- Re: [tlug] [OT] Strip Kanji from a document for study purposes
  - From: Dave M G

Prev by Date: [tlug] Digest mojibake [was: Strip Kanji from a document for study purposes]
Next by Date: Re: [tlug] Scribus needs help with CJK
Previous by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
Next by thread: Re: [tlug] [OT] Strip Kanji from a document for study purposes
Index(es):
- Date
- Thread

Home | Main Index | Thread Index