WebApp Sec mailing list archives

Re: Preventing cross site scripting


From: "Tim Greer" <chatmaster () charter net>
Date: Fri, 20 Jun 2003 14:49:58 -0700




----- Original Message -----
From: "Laurian Gridinoc" <laur () grapefruitdesign com>
Subject: Re: Preventing cross site scripting


...

Again, interesting idea, but I don't see the advantage for me
personally.

I replied to your message but in the context of the thread starter
message - filtering html; and doing it by treating html as a language
rather than just text.

But you can't. You have to look at it as text and determine what characters
will be dangerous. HTML is only a markup language, there's no dictionary
type matches. You would have a very large index as well if you attempted to
determine what was valid. That is okay, and is reasonable if done
properly... not the problem. The problem is XSS and how someone can insert
characters or values into otherwise valid HTML tags to cause the problem.
The only way to determine if it's valid and safe, barring a lot of static
assumptions and basically having a huge whitelist, would be to simply strip
out or refuse to render any HTML tag that has any character in it that could
pose the potential to insert something to create an XSS attack.

Only so many HTML tags would allow for someone to do this in reality. The
one's that do, since any tag element and value can be in any combination in
a tag and be valid, so it requires some very specific checks and some just
simply denying it, since it would be too open for faults. Anyway, like you
said, people sent emails in HTML (I personally would either not render any
email with HTML or only safe tags and screw the people that want to send
HTML-ized email), so it can get rather involved, unless you simply remove
those 4 or 6 vital characters from within a specific tag that could cause
the problem.

And, why would someone need the characters in a tag anyway? You can check
this all, allowing the special characters only in what must be valid places.
Even a string with multiple single or double quotes. It's just as effective
and much simpler this way. Text is what creates the markup language, after
all, and thus you can't treat it as a language only and be safe. You are
going to have to do a lot more work and have to modify it for each newly
implemented tag in D?HTML, as well as for anything that could be an *XML,
PHP, etc. type of tag.

Whatever works, works though. Also, regex's don't have to be written on
one
line. In Perl, for example, simply use the /x anchor and you can break
it up
to be very readable.

I wasn't aware of it.

Nonetheless, if you develop anything along the lines you
speak, please let me know, I'd like to check it out and what you're
doing.

Yup, I'm already using in production stuff like I posted, whitelist
style - to validate/clean html input, I have a WYSIWYG editor (MSIE
IFRAME in edit mode) which outputs extreme ugly/bad html (combine it
with a copy/paste from MS-WORD and you get something extremely loaded
with custom MS style definitions), all this I have to clean according to
a white list.

When I'll have another examples I'll post'em to the list.


Cheers,
--
Laurian Gridinoc
Chief Developer
GRAPEFRUIT DESIGN

tel/fax: +40.232.233068
tel/fax: +1.646.349.2916
mobile: +40.745.304379
e-mail: laur () gd ro
www.grapefruitdesign.com
www.gd.ro



Current thread: