Bugtraq mailing list archives

Re: HTML email "bug", of sorts.


From: PSE-L () mail professional org (Sean Straw / PSE)
Date: Mon, 20 Aug 2001 21:41:24 -0700

At 15:33 2001-08-20 -0600, Bear Giles wrote:

1) run them through a simple filter for image tags.  With regex,
the pattern could be as simple as "<img ([^>]+)>", case insensitive.
You might need to include some backslash quotes.

.. which immediatley screws up _CODE_ embedded into messages. "Here, joe, the solution to the niggling problem is to replace the code in somefunction with <img src..."

KLUNK. This method would have broken valid code - code which may be expected to be copied and pasted as-is.

For everything that matches, look for any height and width attributes
for the image.  If it's 1, you have a web bug.  Even if it's 2-8 or so,
it's probably still a web bug.

And for code embedded in valid pages, it may not be. How about for images without explicit height and width elements - many clients don't show a preview, or at least show an outline (even on single pixel images) that this wouldn't matter in email. In fact, the 'web bug' could just as easily be a *REGULAR GRAPHIC* (such as a horizontal rule), since you're viewing HTML email, and by the time you realize an image is being loaded - whether it is visible or not - the request has already been made.

Either comment it out or delete it.  The latter may be preferable
if don't want to break scripts.

Now you're stuck needing to match brackets, which very likely will not work properly the instant you receive a quoted message:

> the tag <img src="some tag"
> height="1" width="1">

Where does the IMG SRC closing bracket appear when you're using a simple regexp? What if the second line doesn't appear?

Arguably, if the message body is HTML, the MIME type should indicate as much, there should be an opening HTML tag (but there might not be, and email HTML renderers are pretty lax with this), and gt and lt's that aren't part of the HTML coding of the page would be properly escaped. Then again, what stops the spammer from obfuscating their code in the same way? Try embedding ORDINALS in your page, and a good HTML renderer will render it fine, but most regexps will fail to find a match (I use ordinals to "mailfuscate" mailto urls and even non-URL plaintext email addresses on all of my webpages - it significantly reduces spam which arrives from web-spidering spambots).

Besides BGSOUND, page backgrounds and even TABLE backgrounds could utilize an embedded image, in which case, you won't even see it as an IMG SRC tag. Suddenly, your filter needs to fully parse HTML in order to have a prayer of stripping these tags.

Which makes blocking (via RBL, etc) and effectively filtering spam a pretty darn good solution.


Someone mentioned having a port-80 filter on your firewall -- what of dot trackers which reference a specific port number?

        <img src="http://www.somesite.com:110/dot_tracker.file?uniqueid";>

Anyone running a firewall would probably block certain services -- but all the spammer has to do is run their tracking system on a port for a standard service which a mail client would be expected to access, and that firewalling isn't going to do you much (unless your firewall only allows access for POP3 (110) out to one specific server - joe user is unlikely to configure their machine this way, joe poweruser probably won't because they have multiple accounts, and joe corporateadmin won't because too many users check their various mail accounts from the office, and limiting them in this fashion would be too grievous).


Sorry if I've pointed out another exploit that the spammers could use to circumvent such firewall rules.

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395


Current thread: