
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [tlug] Mail archiving question
On 04/08/07, Stephen J. Turnbull <stephen@example.com> wrote:
> Jim Breen writes:
> > I foolishly volunteered to help set up a searchable
> > email archive for the Honyaku mailing list (A few
> > TLUGers are also on that list.) My current task is to
> > extract the essential headers (From, Subject, Date, ...)
> > and the body of the email, convert them to UTF-8 and
> > store them as one file per email. I am working on a collection
> > of about 40,000 accumulated emails from the last 18 months.
>
> I would use Python's email module. Proof of concept would be about 20
> lines of code, I guess. (Hint: the email module treats mail as a
> quasi-dictionary of headers, with Unicode key-value pairs. All that's
> left is using the right codec in the flatten method after deleting the
> headers you don't want.)
Given my minimal level of Python skill, those 20 lines of POC may take
weeks.
> Simon Cozens might recommend Mail::Audit (and then again he might not;
> while he hasn't found the One True Language yet, at least he's been
> abandoning false ones at a great rate). If I recall the author
> correctly, you can trust it to be more bullet-proof than the Swiss
> internet backbone. Proof of concept would be only one line
> noise. (White noise, that is, at 120dB.)
I looked the package Simon mentioned, but took fright.
> MHonArc may have an appropriate option.
Thanks. I'll look into that. We actually have the archive system
already, with regex searching, etc. It's the one-off importing of
a batch of emails that's needed.
> metamail? Now that's a blast from the past.
Since it does almost all I want, I'm inclined to use it for
this once-off.
I now have the "base64" utility, so I can detect and deflect
stuff containing html, etc.
Thanks
Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/
Home |
Main Index |
Thread Index