Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Mail archiving question



Jim Breen writes:

 > I foolishly volunteered to help set up a searchable
 > email archive for the Honyaku mailing list (A few
 > TLUGers are also on that list.) My current task is to
 > extract the essential headers (From, Subject, Date, ...)
 > and the body of the email, convert them to UTF-8 and
 > store them as one file per email. I am working on a collection
 > of about 40,000 accumulated emails from the last 18 months.

I would use Python's email module.  Proof of concept would be about 20
lines of code, I guess.  (Hint: the email module treats mail as a
quasi-dictionary of headers, with Unicode key-value pairs.  All that's
left is using the right codec in the flatten method after deleting the
headers you don't want.)

Simon Cozens might recommend Mail::Audit (and then again he might not;
while he hasn't found the One True Language yet, at least he's been
abandoning false ones at a great rate).  If I recall the author
correctly, you can trust it to be more bullet-proof than the Swiss
internet backbone.  Proof of concept would be only one line
noise.  (White noise, that is, at 120dB.)

MHonArc may have an appropriate option.

metamail?  Now that's a blast from the past.


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links