Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Making sure people get the message



Before I get into the discussion of SMTP, Dave, I hate to tell you
this but having a server subdomain of the ".info" or ".biz" domains
is probably going to cost points with spam filters.  I don't think
I've ever received a legit mail from either one, not even by mailing
list.  Even the Chinese numerical domains (163.com, 126.com, and
263.com) are less spammy in that sense.

The sender address "info" also reeks of spam.  Mailboxes like "sales",
"info", and "service" are great for *inbound* mail, of course, but
a sender address should be a mailbox that tells you why you want to
read it, either a personal name or a topic-specific mailing list name.
Even "tokyohockey@example.com" is a better choice from this point
of view.

Regarding encodings, UTF-8 should be OK AFAIK.  Japanese spammers
mostly use Shift JIS in my experience.

Godwin Stewart writes:

 > On Sun, 01 Oct 2006 15:25:14 +0900, Dave M G <martin@example.com>
 > wrote:
 > 
 > > I don't quite understand why it says "Received: from nobody" when the 
 > > "From" header has "info@example.com".
 > 
 > That's because the SMTP envelope sender doesn't have to have anything
 > to do with the "From:" header in the data.
 > 
 > This is "SMTP 101".

C'mon, Godwin, be a little helpful.  He says he doesn't know.  (BTW, I
believe that the user in the Received header is a user, not a mailbox,
which is why it's normally only present for the initiating user and
.forwards etc.)

Dave, there are a large number of technologies and even more standards
involved in getting content from you to me by email.  As is usual in
computer technology, there is a lot of modularization, or more in line
with communications technology, layering.  Without being excessively
pedantic about the technicalities, and mixing standard terms with some
I've made up for my convenience, for the problem of running a mailing
list and dealing with spam, there are three layers: the platform (the
hardware and software that deal with the Internet), the mail transfer
agent ("MTA", eg, Sendmail, exim, Postfix) which talks to other MTAs
via the platform over the Internet, and the mail user agent ("MUA"),
which might be interactive (like Pine, Evolution, or Outlook Express),
or automated (like Mailman or your script).  We'll call the automated
ones "mailing list managers" ("MLM") since that fits your application.

Conceptually, a sending MTA accepts a fully-formatted message on
standard input (ie, if you have a message in dave.msg, you do
"sendmail stephen@example.com <dave.msg" and I'll get the mail), and
then

1. Calls up the other user's MTA on port 25 (the SMTP port).
2. The other MTA initiates the SMTP protocol by announcing itself.
3. Your MTA announces itself with

    HELO davesbox.com

4. The remote responds with a handshake, and in some variants,
   information about extension it can handle:

    250 OK

5. Your MTA starts an individual transaction by announcing it has
   mail:

    MAIL FROM <dave@example.com>

   That is the "envelope sender".  Officially it exists only
   in the transaction between MTAs, although there are a number of
   unofficial headers (especially the From pseudo-header in an mbox
   file, this is the one without a colon at the very beginning of the
   message headers---think of this as where you throw away the 
   envelope, but scribble some address information at the top margin
   of the letter) that may contain it.  The partner responds "250 OK". 

6. Your MTA starts listing recipients:

    RCPT TO <stephen@example.com>

   and the partner responds "250 OK" to each one.  This is the
   "envelope address".

7. Then your MTA says "DATA", and send the entire formatted message it
   received on standard input to the partner *verbatim*, followed by a
   line containing exactly one period (no whitespace except for the
   trailing CRLF).

8. The receiving MTA then does the following:

    a. Adds an implementation-dependent "Received" header which may
       contain the Unix user which invoked the MTA as you observed.
       This is *not* an email address, except by accident (ie,
       usernames usually correspond to mailbox names).

    b1. Sends the message in the same way _with the same envelope
       sender and address as in steps 5 and 6 above_ (that's why this
       is conceptually "on the envelope," because it doesn't change
       ever along the path to destination, but it's not in the
       message).

    OR

    b2. Checks for the presence of a Return-Path header, and if there
       isn't one adds a Return-Path header containing the envelope
       sender's address.  Then it puts the resulting message in a
       local user's mailbox.

All of the above is defined in RFC 2821, as the Simple Mail Transport
Protocol (SMTP).  There are more accurate examples in the RFC (I've
almost surely left some stuff out or gotten it wrong), but that gives
the basic flavor.

The important point is that the MTAs control only the Received headers
and may set the Return-Path.  Because of the layering, they are
designed[1] to neither know nor care about anything else.  Conversely,
because they come *after* the MUA in time, the MUA *cannot* know about
or affect those headers.  Thus, the envelope information + these
headers are entirely independent of the other headers and body content.

Everything else in the message except for those very limited headers
is put there by MUAs.  The body of the message, including any
attachments, is specified briefly in RFC 2822 and more flexibly in the
many MIME RFCs.  RFC 2822 is mostly concerned with message headers,
and provides for several kinds of address fields.  There is a Sender
field, which is the (single) mailbox of the (usually human) agent
"responsible" for transmitting the message.  Think of this agent as a
"secretary" if different from the author.  Typically it will be the
same as Return-Path.  There is a From field, which contains the
mailbox(es) of the author(s).  There is a Reply-To field, contain the
correspondence address.  (This is explicitly reserved to author by the
RFC, which is one reason why RFC sticklers hate mailing lists like
TLUG that change the Reply-To address to point to the list.)  Finally
there are a number of recipient fields, To, Cc, and Bcc, whose
semantic differences are irrelevant here.

So if you've been counting carefully, you'll find 5 different
"sources" for mail: the Unix user, the envelope sender, the header
Return-Path (but the only officially-defined value other than the
envelope sender is the empty path <>), the header Sender, and the
header From (and maybe the header Reply-To).  *If* the MUA conforms to
standard, they will have a certain relation to each other, but
pragmatically they are independent of each other.

The PHP mail module probably sets some headers (such as Date and
Message-Id) automatically, provides defaults for others if you don't
set them manually (From will usually default to the process's owner),
and finally allows you to add arbitrary headers it doesn't know about.
But in theory they are *all* under your control (eg, using the
"sendmail <msgfile" method).

Footnotes: 
[1]  In fact, historically many MTAs violate this principle.  They'll
translate body content from one encoding to another, etc.  But such
manipulations are normally very limited.



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links