WebApp Sec mailing list archives

Re: SQL injection (no single quotes used)


From: "Sverre H. Huseby" <shh () thathost com>
Date: Mon, 20 Dec 2004 22:34:26 +0100

[Mike Andrews]

|   Not all SQL attacks are on strings - integers used (maybe ids?
|   are also susceptible.  For example, at
|   http://crash.se.fit.edu/hackerland/

|   http://[...]/hackerland/showdetails.php?type=flowers&id=2
|   uses a string and an integer to retrieve the description.  Both
|   are vulnerable, but if I just change the id number to...

|   http://[...]/hackerland/showdetails.php?type=flowers&id=2 or 2>1

|   ...there's no escaping of quotes needed, or the usual comment
|   trick - it just works with the SQL that is generated.

Exactly.  And there's no magic to it at all, if people are just
willing to think about what the "next layer" they pass the constructed
string to will do.  In the SQL Injection case, the "next layer" is the
SQL parser of the database server.

I wrote about this on vuln-dev on my birthday 2.5 years ago [1], but I
feel like sort of repeating some of it, this time a little bit more in
(technical) detail than the last time.  All injection problems, like
SQL Injection, Shell Command Injection, XPath Injection, LDAP
Injection and even Cross-site Scripting (HTML Injection) are the
result of programmers building a statement for some parser to read,
and mixing data with language constructs.  If the programmer would
bother to play the role of the parser she passes the data to, the
injection problems would lose all their magic.

A parser will, based on some syntax (or lexer) rules, scan an incoming
string character by character in order to build a list of "tokens".
Tokens may be command words, variable names, operands, literals
(strings, numbers, ...)  and more, depending heavily on what kind of
parser we're dealing with.

The process of combining characters to make a token, is steered by
certain "metacharacters".  When looking for a new token, the parser
will combine characters until it reaches a metacharacter, and then
consider the seen characters as a token.

What is (and what is not) a metacharacter depends on a "context".  The
context describes what the parser expects to see, based on what it has
already seen (a simplification, but I hope parser people will forgive
me).

Let's say an SQL parser has read the following (it has not finished
processing the entire statement):

  SELECT * FROM Users WHERE Name='sve

The parser has collected the following tokens already:

  SELECT
  *
  FROM
  Users
  WHERE
  Name
  =

And it's currently working on a string literal, since it has seen a
single quote in a position where it expects an "expression".  The
single quote indicates that the expression is a string literal.  As
the parser has not yet seen the ending single quote, it is in a
"string literal context", and it will continue being in that context
until it reaches a terminating single quote (i.e. an unescaped single
quote).  Inside a string literal context, the single quote is a
metacharacter, since it will make the parser switch out of the string
literal context, and start looking for new tokens.

Now, let's instead say that the parser has read the following (again,
we don't see the entire statement, just the part the parser has seen):

  SELECT * FROM Users WHERE Id=12

Now the parser is currently parsing a numeric literal.  It started
looking for an "expression" after the equals sign, as that's what the
syntax of the language says it should do, and stumbled across a
numeric digit ("1").  It thus knows that it is in a "numeric literal
context", and will continue to be there until it reaches the magic
metacharacter that ends that particular context.  In the numeric
literal context, this happens to be any character that may not be a
valid part of the number being parsed, meaning that all non-numeric
characters make up the large set of metacharacters for this context.
Most of these metacharacters will make the parser choke, as they do
not follow the syntax.  However, some of them may be just what the
parser is hoping to find, for instance a space, a newline, a tab, a
minus, a plus or any other character that may legally appear between
the tokens.

As soon as we (the programmers) allow outside input to make the parser
switch context, we're in trouble.  The parser may start looking for
tokens that give our initial statement a totally new meaning.  In
general, when including data (e.g. user input) in statements, we want
the parser to treat the data as a _single_ token.  To be able to do
that, we need to understand how the parser will treat our pasted data
in the exact position and format we pass it.  We need to pay attention
to metacharacters, meaning that we need to know what makes up the set
of metacharacters for the position and format we paste our data in.
Metacharacters should lose their special meaning in our single-token
data, either by being escaped, or by being removed.

It's actually quite simple, but we need to look one step further than
we usually do when programming, and we need to carefully study the
documentation of the parser we pass our statements to.  (Some vendors
add metacharacters beyond the standard they implement.  As an example,
the backslash is a metacharacter in string literals in MySQL and
PostgreSQL, even though the SQL standards do not say it is.  Failure
to pay attention to every single metacharacter may give us serious
problems.)

Geez, why do I always end up writing this much after a couple of
glasses of red wine?  Feel free to E-mail me directly if I made you
even more confused than you already were. :-) I certainly am at the
moment.


Sverre.


1: http://shh.thathost.com/text/passing-data-03.txt

-- 
shh () thathost com               My web security book: Innocent Code
http://shh.thathost.com/       http://innocentcode.thathost.com/


Current thread: