
Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[tlug] Mail archiving question
Jim Breen writes:
> I foolishly volunteered to help set up a searchable
> email archive for the Honyaku mailing list (A few
> TLUGers are also on that list.) My current task is to
> extract the essential headers (From, Subject, Date, ...)
> and the body of the email, convert them to UTF-8 and
> store them as one file per email. I am working on a collection
> of about 40,000 accumulated emails from the last 18 months.
I would use Python's email module. Proof of concept would be about 20
lines of code, I guess. (Hint: the email module treats mail as a
quasi-dictionary of headers, with Unicode key-value pairs. All that's
left is using the right codec in the flatten method after deleting the
headers you don't want.)
Simon Cozens might recommend Mail::Audit (and then again he might not;
while he hasn't found the One True Language yet, at least he's been
abandoning false ones at a great rate). If I recall the author
correctly, you can trust it to be more bullet-proof than the Swiss
internet backbone. Proof of concept would be only one line
noise. (White noise, that is, at 120dB.)
MHonArc may have an appropriate option.
metamail? Now that's a blast from the past.
Home |
Main Index |
Thread Index