Re: [tlug] state of the art spam filtering

Date: Thu, 18 Mar 2010 12:03:53 +0100
From: Attila Kinali <attila@example.com>
Subject: Re: [tlug] state of the art spam filtering
References: <20100316092524.c153a4a9.attila@example.com> <20100316104829.GL4400@example.com>
Organization: NERV

On Tue, 16 Mar 2010 19:48:29 +0900
Curt Sampson <cjs@example.com> wrote:

> On 2010-03-16 09:25 +0100 (Tue), Attila Kinali wrote:
> 
> > ...on both primary and secondary MX...
> 
> Can you define what you mean by "primary" and "secondary" MX? Is there
> actually any difference between these servers, besides the priority in
> the DNS?

primary is for me the one MX on which the mails should end up.
That can be the server which runs the mailinglist software, the
one that is doing the local delivery, the one that puts it into
an IMAP server or the one that decides that it is forwarded to
a different address.

secondary are for me the backup MX which store the mails in case
the primary is not reachable for a certain amount of time.

> I didn't mention it when I was talking about my configuration, but in
> that case it's perfectly reasonable to run all of one's servers at the
> same priority.

In most circumstances it doesn't really matter. If the primary is running
and the mail is send to a secondary, the delay caused by this is less
than a minute, if not just a second or two.

> > My current setup for the high-volume domains is to have strict
> > envelope-from/envelope-to checking... and reject everything with a
> > 4xx that has an invalid envelope-from, resp 5xx if the envelope-to is
> > invalid.
> 
> Well, everybody needs to reject things with an invalid envelope-to. What
> are you going to do with it if you accept it? :-)

By default, quite a few MTA software accept invalid envelope-to's if
they are secondary for a domain.

> But how do you define an "invalid" envelope-from? As we've seen in
> other things that have come up on the list, validity changes from
> place to place and time to time. And while there are various checks
> you can try to do, none of these guarantee that the address can
> actually be delivered. 

I do a reverse delivery check. Ie the MTA tries whether the envelope-from
can be reached _and_ accepts mails. The result of this test is cached
in a local database. 

> Further, much spam these days does have a
> valid envelope-from, it's just some random valid address the spammer
> "borrowed" from some poor sod who's going to have to deal with all of
> the blowback.

It actually causes problems even if it's not valid. We got a complaint
from one of the big free email providers in france, that we are creating
a considerable load on his servers due to the amount of checks we are
performing (a lot of spamers are using that domain with some valid/invalid
user part). I don't know yet how to solve this issue.

In case the email address is valid. We will not bounce once the mail has
been accepted. This means that everything that might cause the mail to
bounce has to be checked before the DATA command is finished, so that we
know that the mail is deliverable with 99.9% certainty.  It also means
that the spamfilter that is applied much later, does not bounce. It will
tag the mail, quarantine it, eat it,... but never bounce.

> > A nice and cheap filter that also catches quite a lot is the
> > requirement to have a valid FQDN in HELO/EHLO (though it does not have
> > to resolve).
> 
> If it doesn't resolve, how do you know that it's a valid FQDN? By the
> RFC standards, , "mail.yahoo.com" is not an *F*QDN because it doesn't
> end with a period. ("mail.yahoo.com." would be an FQDN.) But most SMTP
> delivery agents don't fully qualify their HELO name with a period.
> Conversely, since "blah." is an FQDN (even though it doesn't resolve),
> and "com." is (and even does resolve, albeit only to NS records) by that
> standard you'd need to accept "HELO blah" and "HELO com".

Uh.. i let postfix decide that :-)
I think it checks whether the FQDN matches a certain regexp and the
TLD is valid. The reason why i do not resolve the FQDN is, that i
don't want to disallow people with home machines, that do not have
a fix IP and hence do not resolve its name to an IP to send mails.
Ofcourse, the envelope-from has to be valid :)

> I have a limited set of local access lists which are used as much for
> allowing things as denying them, a handful of header and body checks
> that are only there to get rid of the most egregious stuff, and for
> the rest I rely on the following SMTP client RBLs, which have done an
> excellent job for me:
> 
>     sbl-xbl.spamhaus.org
>     bl.spamcop.net
>     dul.dnsbl.sorbs.net
>     web.dnsbl.sorbs.net
>     socks.dnsbl.sorbs.net

As i said in my previous mail, i dont think RBLs are a good solution
as they catch also legitimate users (especially me ;-).

> That still leaves me with a hundred to two hundred spams per day, all
> but a few per week of which are caught by spamprobe, which is a Baysean
> filter.

The MTA level checks we perform on the MPlayer/FFmpeg mail server reject
in average one mail(spam) every 10s. Spamassassin catches a spam mail
every 1.5min. This is about 8000-9000 rejects and about 1000 spam catches
a day. In comparison, we receive about 5000 "legitimate" mail per day.

My private mail server doesn't do such rigorous MTA level checks and hence
i get about 200-500 spam mails per day on my personal account (to about 200-300
legitmiate mails).

			Attila Kinali

-- 
If you want to walk fast, walk alone.
If you want to walk far, walk together.
		-- African proverb

Follow-Ups:
- Re: [tlug] state of the art filtering
  - From: Tobias Diedrich

References:
- [tlug] state of the art spam filtering
  - From: Attila Kinali
- Re: [tlug] state of the art spam filtering
  - From: Curt Sampson

Prev by Date: Re: [tlug] state of the art spam filtering
Next by Date: Re: [tlug] state of the art spam filtering
Previous by thread: Re: [tlug] state of the art spam filtering
Next by thread: Re: [tlug] state of the art filtering
Index(es):
- Date
- Thread

Home | Main Index | Thread Index