WebApp Sec mailing list archives

Re: Canonicalization


From: Andrew van der Stock <vanderaj () greebo net>
Date: Wed, 12 Apr 2006 22:50:13 +1000

Susan,

I am the lead OWASP Guide author so I hope I can answer your query.

The basics of this sentence is the fact that there are many ways to encode text in web apps, and if you're going to make decisions about that text, or accept it for persistent storage, or re-display it, it's vital that you make it "canonical" or as simple as possible before you act on it.

For example, if you get:

select%20*%20from%20...

from the user and you write code to tokenize input based upon spaces, it will not see any spaces.

So you must decode the that string properly (so it becomes "select * from ...") and then you can process it "safely".

Be aware of double and n-deep encodings - they can occur, and obviously there are many encodings you've never seen or considered. That's why I strongly advocate positive validation.

ie (in C# and .NET, but applicable to most languages):

Hashtable clean = new Hashtable();

// Ensure that if the statement fails for any reason,
// the collection has a safe value for our field
clean.Add("field", "");

// is the data a single word no more than 20 characters long, using only a-z and 0 to 9? if ( Regex.isMatch(Request.Form["field"], "^[a-z0-9]{1,20}$", RegexOptions.IgnoreCase) )
{
// it's safe to take the value of the string as there's most likely no nasties
        clean["field"] = Request.Form["field"].ToString();

        // now ensure that the business rules make sense
        processFieldBusinessRules(clean["field"]);
}
else
{
        throw ...
}

// Now it's moderately safe to use or store the data in clean[]

...

thanks,
Andrew

On 11/04/2006, at 11:12 PM, susam_pal () yahoo co in wrote:
I found the following paragraph in owasp.org. Can someone please elaborate on this?

Parameters must be converted to the simplest form before they are validated, otherwise, malicious input can be masked and it can slip past filters. The process of simplifying these encodings is called “canonicalization.”

Attachment: smime.p7s
Description:


Current thread: