Article 422 of meer.list.spam-z:
Path: matra.meer.net!gateway
From: timster@mo.net (Timster)
Newsgroups: meer.list.spam-z
Subject: Re: Rejecting mail from unresolvable hosts (LONG reply, hopefully not boring)
Date: 20 Jan 1998 01:35:59 -0000
Organization: Spam is Theft.   See http://www.cauce.org/
Lines: 148
Distribution: meer
Message-ID: <34C3FD5B.CB2B1BC9@mo.net>
References: <199801191956.LAA15647@kithrup.com>
NNTP-Posting-Host: matra.meer.net
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: matra.meer.net 885260162 25937 (None) 140.174.164.31
X-Complaints-To: usenet@matra.meer.net
Xref: matra.meer.net meer.list.spam-z:422

Sean Eric Fagan wrote:
> 
> I just added some lines in sendmail that (supposedly, anyway -- I really need
> to go over that sendmail book again ;)) reject (error 451, so transient
> rejection) hosts without a valid PTR record.
> 
> Does anyone else do this?  I haven't had one blocked since I did this (so
> amn't even sure it will work ;)).  I had been checking for a couple of weeks,
> and didn't notice any legitimate email coming from such sites, so I think the
> collatoral damage will be minimal... but I'm still somewhat uneasy about it.

Yeah I used to do this, and I stopped because I honestly found it
ineffective.  It's a good idea, and it will stop some spam, but in my
experience so far, nothing works as well as filtering (if you can't use
the RBL for political reasons where you work) and aggressive (but
polite) complaining.

As much insight as I have gotten from the wonderful people on this list,
maybe this is my opportunity to share some of my own. I hope it helps. 
This is pretty long, so bail out here if you're not interested.

The sendmail config changes I made at the time I was doing the blocking
you describe used the Sendmail built-in check_compat rules, which have
been in place since sendmail 8.8.4 (I think).  I used the check_mail
rule, as described in Claus Assman's outstanding sendmail page
(http://www.informatik.uni-kiel.de/%7Eca/email/english.html).  I
recommend this page highly to anybody who deals with sendmail, whether
spam is an issue or not.

The check_compat rule built-ins are the only location in the sendmail
code in which the envelope sender and envelope recipient are available
for viewing at the same time; sendmail normally deals with one or the
other, depending on which ruleset is being executed in the flow.  The
check_compat code was created explicitly to provide filtering
capability, though not necessarily for anti-spam activity.

The check_mail rule was ineffective for me because it did a reverse
lookup on the sender's domain name in the SMTP envelope information
(i.e., the mail from: SMTP info).  Spammers now generally forge this
envelope domain information with a legitimate domain but a bogus
account, such as friend@aol.com, 67542345@hotmail.com, specifically to
foil this type of filtering.

Some spammers are too ignorant or plain stupid to know this, and will
forge using non-existent domain names in the envelope, rather than in
the headers themselves (From:, Sender, Reply-To:, etc.), which are
useless except as a convenience to user agents.  As a result, they get
bounced using this method.  But you'll also bounce some legitimate mail
from incompetent admins who let internal hostnames slip out in mail
headers, which cannot be looked up via DNS because they are using a
firewall.  Honestly, any site that sends out mail to a business that
cannot be replied to deserves to be bounced, and my managers backed me
on this position.

But, for political reasons at the time I was attempting this, I was not
permitted to modify the mail configuration on the corporate firewall
where I work ("..if you touch the firewall, it might break!  Maybe the
vendor won't support us!), and was forced to implement these rules on an
internal mailhub.  As a result, the bounced messages wound up getting
queued on my internal mailhub.  Since sendmail attempts retries on soft
errors (any 4XX error), which in my case were configured for retries
every 4 hours for 5 days, I would up getting thirty re-tries for every
bad address I bounced!  Things have changed now that people have calmed
down and I have full firewall access, and I put in a dedicated e-mail
firewall/hub this past weekend.  Long before that, though, I switched to
header filtering rather than reverse lookups on the internal mailhub.

Now for my personal opinions, which can be taken with a large grain of
salt by the esteemed members of this list:

I'm convinced that a judicious mixture of filtering and aggressive
complaining is the key to keeping the spam problem to a manageable level
for a given site.  I filter to protect my site, and complain to help
both my site and others.  I feel that filtering without complaining is
craven, and complaining without filtering is impractical and
idealistic.  I can't responsibly *not* filter spam at my work site;  I
know how to do it, and it helps protect my users.  I love nothing more
than to repay spammers with their own misery, and I have a large number
of kills.  But there's only so much time I can justify to my employers
chasing down spammers at work.  I get support from managment for how
much spam I stop from reaching my users, not for how many spammer scalps
I get.

At home, I don't filter, or munge.  And I relentlessly chase down
anybody who spams me.  I can handle people who spam me personally, but
don't want to conscript people where I work into the same activity. 
That's just me.

I'd love to use procmail site-wide where I work, since I believe it's
the most powerful filtering tool in the hands of a capable admin.  I'd
also gladly use the RBL, but this has been turned down by my management
as too risky (I don't share their view, and fully support the RBL).  I
can't use procmail because it is really  an SMTP delivery agent with
filtering/filing capability, and a large proportion of the users at my
work site use a legacy proprietary mail system that requires an SMTP
gateway.  I needed something that would filter at the mail transfer
agent level, prior to passing off to the proprietary delivery agent for
final delivery.  I also needed something that wouldn't bounce spam, but
would quietly drop or save it.  I believe that bouncing spam (not
refusing connections, though) is a huge waste of time.

As a result, last April I started using a wonderful V8.8 sendmail patch
developed by Tim Berger called "spamcan" (See
http://consult.ml.org/~timb/spamcan/ for details).  It employs a
site-configurable blacklist file of regular expressions to  scan header
information only (including Received from: lines), but not message
bodies.  Messages fitting the criteria are stored in a central location,
with a header added (X-Spamcan-Reason) that describes the reason why the
message was rejected.  We took an early release of the patch and
modified it to allow a whitelist, which allowed us to do *very*
aggressive filtering against the blacklist with little risk.  Prior to
spamcan, I saw 30-50 spams a week.  I now see 3 to 5 in a bad week; most
users at my site see none.

If you can't run procmail, or use the RBL, I highly recommend spamcan; 
nothing can filter all spam, but with spamcan, I can keep it to
manageable levels, never risk losing a legitimate message due to a false
positive, and I can use a perl script to quickly get a report of every
message captured.  Even if you already use the RBL, spamcan can act in
concert with it, helping find new spam sites.  Unlike procmail, spamcan
can only filter on single lines, not patterns that are separated by
mutiple lines.

I have a number of users at my site who forward any spam that makes it
past the filter to me, so I can get the broadest possible sample to
enhance my filter file.  I can filter against any IP address, domain
name, phrase, stupid spammer trick (e.g. -600 (EST) and -700 (EDT) are
favorites, along with all-numeric userids and domain names, and giveaway
patterns in subject lines).  I chase down what gets through my filter,
and pull random messages out of the capture file regularly to chase down
my "spammer punk of the day".  The complete saved message is all there,
headers and all, making it simple to make a case to an ISP why Johnny
Spammer was a bad boy.  It was invaluable in the old t-1net and
cyberpromo days; I could block against the IP address to can all
Cyberpromo-sent mail, and block against key header phrases to block
their customers who relayed from sites other than Cyberpromo.  It was a
lock back then.

OK, so there's my contribution to the group.  If anybody uses spamcan,
I'm happy to share my own spamcan blacklist of regular expressions
(might be useful to procmail users, too).


-- 
Timster

Theft Doesn't Scale.  See http://www.cauce.org/



