WebApp Sec mailing list archives

Re: Preventing cross site scripting


From: Laurian Gridinoc <laur () grapefruitdesign com>
Date: 21 Jun 2003 00:55:18 +0300

On Fri, 2003-06-20 at 20:11, Tim Greer wrote:
Please provide some examples of this. I'd like to see your idea(s) at work
and how it would solve this problem. I'm honestly not quite clear on the
context in which you mean this to solve this problem and I'm interested
knowing. I'm not sure I agree right now, so some examples illustrating it
would be great--if you'd be so kind. Thanks.

This thread started with `how to export safely HTML mail messages to the
web'.
This may require to deal with the some of the following issues:

1. broken markup (<ni <foo href="a"d"" bar='> baz> &quot no semicolon)
2. unacceptable entities
3. unacceptable tags (applet, object)
4. unacceptable attributes on acceptable tags (onmouseover, ...)
5. unacceptable attribute values (href="javascript:...", width="100000")
6. unacceptable text tokens (offensive words)

I suggest to deal with them in the stated order, and not treat the HTML
string as a mere string, but dissect it in markup and content; clean the
markup (first elements, then attributes of the accepted elements) then
text.

[1] is wonderfully solved by filtering through tidy outputting xml
(xhtml) - this would be the data for the next steps.

The rest of the issues may be controlled by a XSL transformation on the
above generated xml.

[2] with a proper DTD you may alter the `rendering' of any unaccepted
entity, let's say that I want to change &acirc; (capital A, circumflex
accent) to capital A instead, simply by defining it in the DTD:
<!ENTITY Acirc  CDATA "A">

Note that &lt;, &gt;, &amp; and &quote; cannot be handled this way.

[3] unacceptable tags, now is preferable to use white lists; let's see a
black list solution:

<!-- drop script silently-->
<xsl:template match="script" />

<!-- or drop script and leave a note -->
<xsl:template match="script">
        <xsl:comment>here was an evil script</xsl:comment>
</xsl:template>

<!-- drop applet preserving it's content (ex. the `backup' markup for
useragents that don't understand applet tag) -->
<xsl:template match="applet">
        <xsl:apply-templates />
</xsl:template>

<!-- and accept everything since this is a blacklist solution -->
<xsl:template match="*|@*|text()|comment()">
    <xsl:copy>
        <xsl:apply-templates select="*|@*|text()|comment()" />
    </xsl:copy>
</xsl:template>

The whitelist solution would match only accepted tags:

<!-- accept only p, ul, li and attributes on them (and text nodes too,
and comments) -->
<xsl:template match="p|ul|li|@*|text()|comment()">
    <xsl:copy>
        <xsl:apply-templates select="*|@*|text()|comment()" />
    </xsl:copy>
</xsl:template>

[4] unacceptable attributes, blacklist version:

<!-- accept everything on `a' except on* attributes -->
<xsl:template match="a">
        <xsl:element name="a">
                <xsl:for-each select="@*">
                        <xsl:if test="not(starts-with(name(), 'on'))">
                                <xsl:variable name="attribute">
                                        <xsl:value-of select="name()" />
                                </xsl:variable>
                                <xsl:attribute name="$attribute">
                                        <xsl:value-of select="." />
                                </xsl:attribute>
                        </xsl:if>
                 </xsl:for-each>
        <xsl:apply-templates />
        </xsl:element>
</xsl:template>

Whitelist version:

<!-- accept only href and title on `a' -->
<xsl:template match="a">
        <xsl:element name="a">
                <xsl:attribute name="href">
                        <xsl:value-of select="@href" />
                </xsl:attribute>
                <xsl:attribute name="title">
                        <xsl:value-of select="@title" />
                </xsl:attribute>
                <xsl:apply-templates />
        </xsl:element>
</xsl:template>

[5, 6] unacceptable attribute and text values, now here is funny, the
string manipulation functions in XSL are few and not so powerful as
regex, but there isn't impossible to build proper value validation.

On strings (node and attribute names, attribute and text node values)
you have just concat, contains, starts-with, string-length, substring,
substring-after, substring-before and translate; almost nothing compared
to regex power, but in the end is not a contest of writing it all on a
line.

I'm not writing this to say regex are bad, I'm just stating that not
everything that can be hold in a string should be treated this way; this
means that HTML should be represented as (parsed to) a DOM tree (where
only nodes/attributes names, attributes values, text nodes and comments
are separate strings) where what cannot be divided anymore (atom) to
another set of tokens should be the subject of validation as a string or
number; however an attribute value which should represent an URL should
be validated by using a parser specifically built for this task (based
on URL grammar).

Cheers,
-- 
Laurian Gridinoc
Chief Developer
GRAPEFRUIT DESIGN

tel/fax: +40.232.233068
tel/fax: +1.646.349.2916
mobile: +40.745.304379
e-mail: laur () gd ro
www.grapefruitdesign.com
www.gd.ro


Current thread: