WebApp Sec mailing list archives

Re: Preventing cross site scripting


From: "Tim Greer" <chatmaster () charter net>
Date: Thu, 19 Jun 2003 19:42:33 -0700




----- Original Message -----
From: "Jeremiah Grossman" <jeremiah () whitehatsec com>
To: "Andrew Beverley" <mail () andybev com>
Cc: <webappsec () securityfocus com>
Sent: Thursday, June 19, 2003 7:23 PM
Subject: Re: Preventing cross site scripting


On Thu, 2003-06-19 at 11:28, Andrew Beverley wrote:
I am currently writing a web application that, as a small part of it,
needs to display an email message. Obviously the message is potentially
in html format, which to display could be sent straight to the browser.

I would like to know the best way of filtering out undesirable html. I
understand the best way is to only allow acceptable information, in this
case all the different html formatting tags.

However, there is a lot of tags that are acceptable. Another approach
would be to strip out all the bad stuff such as <SCRIPT>, <OBJECT>,
<APPLET>, and <EMBED> but this is far from ideal because of new tags
becoming available and so on.

Are there any functions available (for php) that will take a html page
as input and strip out all nasty stuff? Does anyone have suggestions as
to how to do this as easy as possible?

This is a very tough problem to solve,

Not it's not, at all.

and no one to my knowledge has
done it completely effectively.

It's very simple. Pick your language and learn regular expressions.

Any html-aware web applications faces
this dilemma, especially with a web browser loose interpretation of
D/HTML/JavaScript.

You have this issue with any parsed page(s)/script(s) as well, as well as
D?HTML. However, it's very simple to filter out what you don't want. Disable
any HTML tags and put them back together sanely, simply, which is very
simple, and you won't have problems, provided you use very basic regular
expression logic.

Let me say first....

Attempting to safely allow HTML into your system is playing with fire,

Not at all. This is not true.

plain and simple.
 ^^^^^^^^^^^^^^

This is an opinion. The facts are, you can easily allow HTML to be
submitted. Blindly allowing HTML would be a problem, just don't blindly
allow it. It's not as dangerous as people make it out to be.

Taking this into account, we can move onto a decent
solution to implement.

See my previous email regarding this, it's simple.

Use a strict HTML and tag attribute allow list:
Start with small safe set of allowable HTML. Just the tags and
attributes you feel your users need to get the job done and that wont
allow other client-side technologies (JS/ActiveX/Java/etc) to leak
through.

Yes, and it's that simple. Just make sure people can't put in potentially
dangerous values into fields you do allow--and that is quite simple to do.

Parse your html content and only allow those tags and attributes to pass
unfiltered. Any other tags/attributes, replace with html entities.  Be
wary of all STYLE tags and attributes, as well as all *SRC attributes.
Also be careful about whitespace ASCII and HTML entity whitespace
equivalents. These methods have been used to bypass html filters.

Yes, but if filtered properly when put "back together", you shouldn't be
enabling an HTML tag that doesn't match the criteria for the regex. Yo ucan,
simply by checking [\r\n\s]*, etc. But since it doesn't match, it's not
enabled. You should check for whitespace to be kind to the submitter, so
matches are more logical and acurate, but not in a dangerous manner--and
that is very simple.

This should get you fairly close to safety or at least make things
harder to bypass.



Hmmm.

--
Regards,
Tim Greer  chatmaster () charter net
Server administration, security, programming, consulting.


Current thread: