Bugtraq mailing list archives

RE: Webtrends HTTP Server %20 bug


From: "Eric Hacker" <hacker () vudu net>
Date: Wed, 6 Jun 2001 15:51:15 -0400

H D Moore said:
A url-encoded character is NOT a unicode code character..

On Sunday 03 June 2001 05:41 am, Auriemma Luigi wrote:
The bug is really simple. If the attacker insert an unicode space (%20)

Not exactly. A better way of saying it is that URL encoding is not the same
as UTF8 encoding of unicode code points.

Unicode is a superset of ACSII and thus all ASCII characters are Unicode.
UTF8 is a way of encoding unicode code points for transport over the
internet in a restricted character set. Conveniently, UTF8 uses the same
values as ASCII for ASCII representation. Above the standard ASCII 127
character representation, UTF8 uses multi-byte strings beginning with 0xC1.

As a URL cannot contain spaces or other special characters, URL encoding is
used to transport them. Thus all UTF8 characters above ASCII are supposed to
be URL encoded in order to be sent. Therefore the original unicode code
point is both UTF8 encoded and URL encoded.

Hopefully this has clarified some of the confusion around the terminology.
This is, of course, a summary. For the real deal, check out
http://www.unicode.org.

As an aside, yes I know that Microsoft's IIS will accept non-URL encoded
UTF8 characters as well as UTF8 beginning with 0xC0 (now deprecated). At
least that was the case the last time I checked.

Eric Hacker, CISSP, GCIA, MCSE, CCSE
Network Security Consultant
Lucent Technologies Worldwide Services
Phone: 781-848-5500 x485
Email: ehacker () lucent com
PGP key:
http://keyserver.pgp.com/pks/lookup?op=get&search=ehacker () lucent com
PGP Fingerprint: FADB 793E E98A 97BB 04D6  5973 7864 93A1 222B E0C7

"Long gone are the days when one's surname referred to the role
one had in the community."


Current thread: