Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]tlug: despam - a report on a spam blocker
- To: tlug@example.com
- Subject: tlug: despam - a report on a spam blocker
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Wed, 24 Sep 1997 10:42:59 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <199709191324.WAA12806@example.com>
- References: <199709191324.WAA12806@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug
-------------------------------------------------------- tlug note from "Stephen J. Turnbull" <turnbull@example.com> -------------------------------------------------------- >>>>> "Jason" == Jason Molenda <crash@example.com> writes: Jason> I installed a spam blocker here in Tokyo called 'despam' Jason> a week ago. It is a perl script which which includes a Jason> large database of regular expressions to detect spam mail Jason> notes (it looks through the headers or body of mail notes Jason> for certain regular expressions). It has something like Jason> 1,500 or 2,000 regexps it checks against. Yikes! presumably a descendent of the system used by the Cancel-Moose on Usenet? Jason> The merit of any of these systems is how well they block Jason> the spam. I kept track of things for 9 days. Over that Jason> period, period I was sent 117 spams, 79 of which despam Jason> caught (and 38 of which got past it). Some of these 117 Jason> spams were duplicates; I counted all of them as individual I lost my recent mail archives to a disk crash, so I'm just reconstructing from memory. But at one point I had 500 or so, and I'm sure I was doing better than 65%. I use _one_ procmail regexp on _headers only_. It has wrapped around about 5 times on an 80-column window by now, of course. It's true that most of the ones that get through the filter are MLM pyramid swindles. However, I'm pretty sure I know how to catch most of those although I haven't implemented it yet, and it may require going out of procmail: check for a mismatch in the "Received:" chain (especially if there's an intervening "From:"). Come to think of it, lots of MTAs now include a "possible spoof" notice in the headers; filtering on that will catch them in many cases (but it'll also catch Jim Schweiz when he's fiddling with his mailer config :-). Jason> spams. Two messages were marked as spam, but were not Jason> spam. They were digests (the nikon-digest mailing list) Jason> which had spam in them, so I'm not holding that against Jason> despam. despam should check for digests. That's not acceptable to me. Jason> So I'm pretty happy with the results of despam so far. One Jason> drawback of it is that it does eat some CPU time as it goes Jason> through the headers and body of incoming mail notes for all If I understand your description correctly, _and_ you are already using procmail, one thing you can do is to keep a list of your regular individual correspondents and trusted-not-to-spam domains and put them _ahead_ of the despam call in .procmailrc. Also digests, where the cost of the spam may be lower, and the probability of a multiple false positive is high. You can get the multiples sort of for free by keeping a spam-cache of message IDs (see the procmail docs, I know it's possible but not how), and filtering on the cache before using despam. This may require altering despam (it would probably have to call formail, the procmail tool which maintains message ID caches). This cache would be small because the multiples would all arrive within 24 hours, most likely, so you can expire the cache rapidly. Jason> of these regexps. Another drawback is that the spam block Jason> patterns are tied to the releases of despam, so I'm not Jason> sure how frequently updated patterns will be released. Well, that requires analysis of the spams, so it's mostly going to be useful against pyramid swindles. There aren't any new ones :-) You can also make a private spam-blocker like mine, look for something suspicious in the headers and add it to the spam regexp in .procmailrc. In Emacs I use the following procedure: ; mark suspicious domain or address or other feature ; eg "cool.out.do.you.know.where.the.delete.key.is" in Message-ID. M-w ; save it C-x C-f "~/.procmailrc" RET ; cheap if the buffer already exists M-< M-s "abuse/newmail" RET ; I append spam to an abuse inbox ; use mh-inc and mh-scan to check for ; non-spam, mv them to a safe place, ; then delete the spam files C-a C-b ; back up to end of regexp "|" C-u C-y ; yank, leaving point at head C-x n n M-x "replace-string" RET "." RET "\." ; narrow and quote dots C-x n w C-x C-s ; widen and save Note that there are no user inputs after the region is defined; this could easily be turned into a macro or a function (I haven't bothered but if you want to distribute to non-hackers). I don't know how you would do this in Eudora or MS Exchange .... Avoid the temptation to put "prodigy", "aol", "compuserve", and "tlug" into your regexp. The better your regexp is, the less often despam gets called. HTH Steve Next TLUG meeting is Saturday October 11, 1997 ----------------------------------------------------------------- a word from the sponsor will appear below TWICS - Japan's First Public-Access Internet System. www.twics.com info@example.com Tel:03-3351-5977 Fax:03-3353-6096
- References:
- tlug: despam - a report on a spam blocker
- From: Jason Molenda <crash@example.com>
Home | Main Index | Thread Index
- Prev by Date: tlug: tk4.2-jp and japanese characters
- Next by Date: Re: tlug: Locale problem
- Prev by thread: tlug: despam - a report on a spam blocker
- Next by thread: tlug: Swedish characters in tcl/tk-jp
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links