WebApp Sec mailing list archives

Re: Controlling access to pdf/doc files (db "better" than filesystem?)


From: Ido Rosen <ido () cs uchicago edu>
Date: Sat, 28 Feb 2004 14:54:57 -0600

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 28 Feb 2004 11:13:21 -0800
"David Wall @ Yozons, Inc." <dwall () yozons com> wrote:

that  in SQL Server is that all data in SQL Server is split over ~8k
pages. When you add a BLOB it needs to be split into 8k chunks. When you

But filesystems also store data into pages, often much smaller than 8k
chunk.

I agree that storing files with their metadata for such a solution in a database is a better solution than storing 
files.  It's also probably more secure, since the web developer is less likely to botch some permissions, security, or 
sanity checks and since most database systems already have some sanity checks built in.  Your reasoning in that last 
sentence is a bit off, though:  Database systems (such as MySQL, PgSQL, ThinkSQL, and MSSQL) all must use the 
filesystem, so their 8k chunks may not match, and the storage may be out of phase.  This is just a result of overlaying 
one file storage paradigm over another, and shouldn't cause too much trouble speed-wise.  By adding a layer on top of 
the filesystem, you do increase the likelihood of inefficiency.

That said, there's a counterargument: Databases, or at least smart ones, are built to cache data efficiently into 
memory.  If your database server has enough memory, it may even become faster than serving the file off of the 
filesystem directly.  The reasoning for this is that the filesystem cache (if there is any at all) also includes shared 
libraries and other files which are currently executing, given priority over any sort of data caching.  This cache is 
also limited in space, in most implementations, so as not to take too much precious RAM.  Databases, however, are 
generally built with the assumption that if you are using a database server for anything that could use significant 
caching, or for major resource-intensive tasks (like serving hundreds of thousands of users), then the database server 
will be the prime service of the machine, and therefore may take up significant amounts of resources (specifically, 
cache more stuff into memory).  So, in some situations I'd ima
 gine database file storage would in fact be _faster_ for retrieval than filesystem storage.  This is based on too many 
assumptions regarding the database server's design and the operating system underlying the database server, and the 
server machine being used, and so I don't give it much credit.

Then again, I may be wrong...


Our Signed & Secured application stores all files as BLOBs in a database for
all of transactional and backup capabilities, but we've never run tests of
100+ concurrent web users downloading files to see if the database or the
filesystem would be faster.  In general, faster was less important to us
being able to support lots of concurrent requests because the speed of
retrieval from the db was always assumed to be faster than it could be
streamed back across typically slower Internet links.  After all, the data
has to be sent back to a user's web browser, so the speed of the transfer is
limited by the slowest link between the browser and the web server.

This is the right attitude.  Speed where it is useful, administrative efficiency whenever possible.

Ido


David


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAQQAhmhQsAkXAJP0RAtsIAJ0YEU2nqXhbrrEEbjuJ6ENNPnBuGwCgo1gS
z2SccYIaCJwsvmk2bnpgZmw=
=0tLv
-----END PGP SIGNATURE-----


Current thread: