Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] Mail archiving question
- Date: Fri, 3 Aug 2007 22:01:22 +1000
- From: "Jim Breen" <jimbreen@example.com>
- Subject: [tlug] Mail archiving question
Some background, then some questions..... I foolishly volunteered to help set up a searchable email archive for the Honyaku mailing list (A few TLUGers are also on that list.) My current task is to extract the essential headers (From, Subject, Date, ...) and the body of the email, convert them to UTF-8 and store them as one file per email. I am working on a collection of about 40,000 accumulated emails from the last 18 months. My first thought was to pipe each email through metamail, as this would unpack things like Base64 and printed-quotable. I can usually work out the coding from the MIME headers, so converting to UTF-8 is not a big problem, Preliminary testing went very well. The metamail approach has run into a snag with emails containg html. It goes and invokes my default browser (Firefox), which is not much use when I'm batch-processing. In many cases I can get around this by detecting there is a second part to the email containing html, and simply throw it away, however in some cases the html is in a Base64 coded block so I'm not aware of it. Another problem is the horrible Microsoft TNEF format, which ignores email rules, doesn't have MIME information, etc. Metamail simply throws in the towel on these. Anyway, to get to my questions: - has anyone done this sort of thing before and can suggest perhaps an alternative approach? - if I am sticking with metamail, is there any easy way to get it to ignore html rather than hitting htmlview? Cheers Jim -- Jim Breen Honorary Senior Research Fellow Clayton School of Information Technology, Monash University, VIC 3800, Australia http://www.csse.monash.edu.au/~jwb/
- Follow-Ups:
- [tlug] Mail archiving question
- From: Stephen J. Turnbull
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] Learning to Program
- Next by Date: Re: [tlug] [OT] Good IT Resume
- Previous by thread: [tlug] Re: Emergency nomikai August 17th?
- Next by thread: [tlug] Mail archiving question
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links