Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Belated thanks (was: Re: [tlug] Mail archiving question)



On 04/08/2007, Stephen J. Turnbull <stephen@example.com> wrote:
> Jim Breen writes:
>
>  > I foolishly volunteered to help set up a searchable
>  > email archive for the Honyaku mailing list (A few
>  > TLUGers are also on that list.) My current task is to
>  > extract the essential headers (From, Subject, Date, ...)
>  > and the body of the email, convert them to UTF-8 and
>  > store them as one file per email. I am working on a collection
>  > of about 40,000 accumulated emails from the last 18 months.

[...]

> MHonArc may have an appropriate option.

MHonArc worked well, although it did more than I wanted, e.g. dressing
each email up in prettified HTML. Also it turned all the Japanese and
Chinese into entity codes. My ultimate solution was to run each email
through MHonArc, then pipe the output through htlml2text, and then through
ascii2uni to recover the Japanese/Chinese as UTF8.

Thanks to Stephen and Josh for the suggestions.

Cheers

Jim
-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links