nanog mailing list archives

Re: Internet email performance study


From: "aljuhani" <info () riyadmail com>
Date: Thu, 28 Apr 2005 23:21:07 +0300


----- Original Message -----
From: "Robert Beverly" <rbeverly () rbeverly net>
To: <nanog () merit edu>
Cc: <afergan () mit edu>
Sent: Thursday, April 28, 2005 22:21
Subject: Internet email performance study



Hi,

(we previously posted this on the e2e mail list; apologies if you are
reading it for the second time)

We're looking for operational-types lurking on the list with experience
running large mail servers.  In particular, we have collected a large
amount of data as part of an Internet email performance study that we
cannot entirely explain.  If you can help us or are simply curious about
our findings, we'd love to hear from you.

WHAT WE DID: Briefly, we used SMTP bounce-backs as the basis of an email
active measurement survey.  Using random addresses as unique identifiers,
we measure latency, loss, paths, etc. to a large set of Internet MTAs.
Approximately 1/3 of all servers we surveyed respond with bounce-backs.
We've found some interesting results.  For example latencies of days (30
days in once instance).

WHAT WE DON'T UNDERSTAND:  Most servers behave as we expect, either always
replying with bounce-backs or never replying.  However, some exhibit odd
and seemingly non-deterministic behavior.  For example, a server will
respond to all emails for weeks, and then reply to only a fraction (e.g.,
25-75%) of the emails in a seemingly random pattern for some period of
time (e.g, 4 hours).  Further, we often see these patterns correlated
within a domain (e.g., a subset of the MTAs will enter and exist this loss
mode at the same time).  We are fairly certain that the loss is an
artifact of the MTA behavior or local administration.  While we can guess
reasons this might occur, we have yet to find an administrator who can
explain this behavior with an architecture used in practice.

Well it could be many reasons for that depending on how you probe SMTPs.
Some sysadmins block IP addresses that seem to be a spammer trying some
addresses to send spam to; spammers try always to find a catch-all mail to
flood with messages addressed to anything () thatdomain com .

Another possiblity is that the domains you are monitoring are on dynamic IP
addresses that changes all the time and the gap when they become
non-responsive
could be due to delay in updating the DNS roots with new IP address.
Also could be a non-dedicated mail servers, meaning that server is used for
web and DNS and when overloaded try to shed some load out and usually
the first service to disable is SMTP.

Or that domain does have a lower priority mail server which happens to be
down for maintenance but your DNS server is caching the data (IP address)
of that mail server which should not happen as it has to retry the other MX
record but
remain a possiblity.

I have not yet looked at the details on your URL but there are number of
things to
consider when doing such survey.

1.  Where is your monitoring server located in relation to the being
monitored servers / domains.
You need to establish a datum for how far is that server or domain using
PING to see how
long the packet takes on round-trip just to role out the fact of networking
/ routing issues that
may interfer into the results which you need for the respones of MTAs.

2. Study that domain using Dig to find MX records and DNS servers and if
there are back up
DNS somewhere near your network.

3. Of course as indicated above, you need to find out if the IP of that
domain is static or dynamic.

4. Also, you need to monitor the load on your own server and DNS responses.

What I'd suggest is to use MRTG to monitor the round-trip time using PING on
the servers being
monitored so you have real live data that helps in establishing your final
findings.

Also not to forget that some MTAs users have thier SMTP with a filter to
reject SMTP traffic
that is not behaving as normal with SMTP Greeting.

If you need any further information or some logs, please send me an email to
info () riyadhub com

More details on the project including our exact methodology, plausible
explanations for the loss and a FAQ are available on our web site:
   http://ana.lcs.mit.edu/emailtester

Thanks!

Rob Beverly / Mike Afergan


aljuhani


Current thread: