WebApp Sec mailing list archives

Re: Preventing cross site scripting


From: "Tim Greer" <chatmaster () charter net>
Date: Fri, 20 Jun 2003 19:54:52 -0700

Thanks for clarifying and being patient with my inquires about your ideas
and what you mean. I better understand what you meant now. Yes, indeed,
there's no need (or point) to deal with anything *but* the tags--not just
the text. This is specifically what I had meant about using regex's to solve
and render based on the conditions of the checks, though there's other ways
to deal with it, like you said. Sounds fun, do update us when you have
something formed. I look forward to seeing other ideas for dealing with this
topic. :-)
--
Regards,
Tim Greer  chatmaster () charter net
Server administration, security, programming, consulting.


----- Original Message -----
From: "Laurian Gridinoc" <laur () grapefruitdesign com>
To: "Tim Greer" <chatmaster () charter net>
Cc: <webappsec () securityfocus com>
Sent: Friday, June 20, 2003 8:36 PM
Subject: Re: Preventing cross site scripting


On Sat, 2003-06-21 at 00:49, Tim Greer wrote:
But you can't. You have to look at it as text and determine what
characters
will be dangerous. HTML is only a markup language, there's no dictionary
type matches. You would have a very large index as well if you attempted
to
determine what was valid. That is okay, and is reasonable if done
properly... not the problem. The problem is XSS and how someone can
insert
characters or values into otherwise valid HTML tags to cause the
problem.

You look at it as text just until you separate the markup from the rest,
then you treat markup and latter the remaining text content. There is no
point in processing the attributes or the text content of an object tag
when I want to drop it from start.

Treating the HTML as text (treating it all in the same step - tags and
values and content) is what was Yahoo making last year - and the ended
up in replacing `evil' stuff not only in the tags but also in the text
content.
[The word "medieval" (since it contains the javascript command "eval" is
converted in Yahoo mail to "medireview".]
http://www.ntk.net/2002/07/12/
http://www.ntk.net/2002/07/12/yahoo.txt
This is also a nice example of how wrong a blacklist filter may be.

The only way to determine if it's valid and safe, barring a lot of
static
assumptions and basically having a huge whitelist,

The huge whitelist is starting with the HTML DTD which defines what and
where is allowed, and the first filtering occurs when Tidy parses the
html document according to the standard which is a whitelist check after
all.

 would be to simply strip
out or refuse to render any HTML tag that has any character in it that
could
pose the potential to insert something to create an XSS attack.

Tidy won't strip by this, it will just proper escape what isn't allow to
remain there in that format resolving almost all of the XSS attacks
which are based on breaking the syntax.

Only so many HTML tags would allow for someone to do this in reality.
The
one's that do, since any tag element and value can be in any combination
in
a tag and be valid, so it requires some very specific checks and some
just
simply denying it, since it would be too open for faults.

not any combination, the DTD restricts it by specifying what is allowed.

Anyway, like you
said, people sent emails in HTML (I personally would either not render
any
email with HTML or only safe tags and screw the people that want to send
HTML-ized email),

most are not aware of what is the mail client doing :) and anyway richer
text formats may enhance communication.

so it can get rather involved, unless you simply remove
those 4 or 6 vital characters from within a specific tag that could
cause
the problem.
And, why would someone need the characters in a tag anyway? You can
check
this all, allowing the special characters only in what must be valid
places.

this is what the parser is doing.

Even a string with multiple single or double quotes. It's just as
effective
and much simpler this way. Text is what creates the markup language,
after
all, and thus you can't treat it as a language only and be safe. You are
going to have to do a lot more work and have to modify it for each newly
implemented tag in D?HTML, as well as for anything that could be an
*XML,
PHP, etc. type of tag.

these are particular cases, you may support (allow) additional tags by
defining them in the DTD; PHP on the other hand uses processing
instructions (<?php ... ?>) rather than it's own namespace for tags
which I consider it bad as concept.

Nonetheless, if you develop anything along the lines you
speak, please let me know, I'd like to check it out and what you're
doing.
When I'll have another examples I'll post'em to the list.

An working example will say more than any code listing, I'll be happy to
assemble one from already running stuff.

Cheers,

--
Laurian Gridinoc
Chief Developer
GRAPEFRUIT DESIGN

tel/fax: +40.232.233068
tel/fax: +1.646.349.2916
mobile: +40.745.304379
e-mail: laur () gd ro
www.grapefruitdesign.com
www.gd.ro



Current thread: