WebApp Sec mailing list archives
Re: Controlling access to pdf/doc files (db "better" than filesystem?)
From: David Cameron <david () uberconcept com>
Date: Tue, 02 Mar 2004 07:42:19 +1100
My original point on the topic was the I fail to see what you gain byusing BLOBS. Returning to that, I'll leave a fuller explanation on SQL Server performance to the end.
By using BLOBs you do gain transactional support. However you must weigh that against the disadvantages. Retrieving and storing those files is much more difficult. Secondly it performs badly (at least in SQL Server, can anyone else comment on Oracle/DB2/MYSQL?). Thirdly the increase in size of the database due to storing files in the database can cause performance of the database to decrease (once again SQL Server). I would be surprised if other databases perform well when working with BLOBs. I might add that doing this does not necessarily add further security (this is after all a security mailing list). If you want to make the files less accessible, move them to a directory that is not accessible from the web, and deny permissions for the account under which the web application runs. When you want to access the files, change to another account that has permissions to see that directory (create a specialised account for this purpose) and then access the files. As for transactional security, you could roll your own. SQL Server Performance (ignore this if you aren't interested, please correct me if I go offtrack): The reason I am saying all this is that BLOB performance in SQL Server is bad. SQL Server is optimised to insert, update and retrieve data thatis limited to 8k pages (the maximum size of a row is 8k), rather than data that is split over multiple pages. LOBs (text, nText, image) are split over a number of data pages as a BTree, to go to a specific location inside the LOB is a quick operation, but slow to add the LOB. Add to that the fact that SQL Server needs to allocate a new page for each 8k segment. This could force a demand for more space for the database, or may force a reorganisation internally. The impact of the space allocation issue could be limited by a good DBA, who ensured that a sufficient amount of space was free at all times. Even so my suspicion (it is a little hard to prove) is that this would not perform as well as adding files to a filesystem.
Recently I ran some tests against a SQL Server database to test INSERT performance of adding BLOBs. The performance was 2-4 times worse when using BLOBs. Performance tends to drop significantly with larger files.I can't comment on the performance of SELECT/UPDATE statements, as I haven't tested against that. I suspect (once again untested) that it would be less than great, although the performance losses might be less.
I have looked high and low for test results/best practices recommendations on BLOBs, without success. Most of the recommendations come through comments from DBAs. The comment on the increase in size (due to BLOBs) causing generally poor performance in the database camethrough a comment from a DBA who had implemented BLOB storage and wished he hadn't after the effect.
There is an occasion to use BLOBs, and that is when either the files aresmall or when there are other side benefits that outweigh the losses, for example storing .txt files in nText of text so that LIKE queries can be run against them. The closest I have come to a genuine piece of information coming from Microsoft on this relates to their terraserver project. Under the section titled "Learning from the TerraServer Design and Implementation", there are some comments on how Microsoft implemented their terraserver project. The comments suggest that performance is best when small files are being stored. In this case Microsoft split the image files into 9.5k chunks, ie it would take only two data pages per image.
http://www.microsoft.com/resources/documentation/sql/2000/all/reskit/en-us/part3/c1161.mspx regards David Cameron Jannie Hanekom wrote:
I'm by no means an expert on the topic, but I'd like to point out that in my Windows/SQL/IIS experience, database-stored files tend to be significantly slower than their file system counterparts. Databases introduce significant overhead; in addition to the layer of translation between the database itself and the file system, there is also the layer of processing that the client database driver introduces. There's also the fact that not the entire 8K page is used for data storage - some overhead is introduced by the database system in use as well. With regards to caching of files by the database in stead of the file system, it will introduce quite a bit of overhead, as the process rendering the file has to traverse the database driver to retrieve the file; also, your dba may not appreciate you flushing out a couple of MB of crucial indexes every time you retrieve a large blob from the database. (How do databases deal with this? Do they cache blobs at all? I've not done any testing on that...) Also keep in mind that in most larger implementations, the database is stored on a server separate from the actual web server, introducing a network layer with its inherent latency into the mix. Add a firewall between the database and web server, and you can guess where this is going... Lastly, this approach wreaks havoc with the performance of solutions where web-server clustering is used to add scalability. Granted, this is less of an issue with the comparatively low volume of traffic that pdf/doc files are likely to produce, but it's pretty effective at bringing a site to a crawl if high-frequency files (such as images) are stored in a database (yes, it's actually done more frequently than you might guess...) The transactional integrity argument is a pretty strong one, but I have reservations about the argument that storing files in the database is more secure than on the web server itself. As of late, there have been far more SQL injection vulnerabilities than web server directory traversal vulnerabilities, so my money would be on dynamically creating and deleting the files or placing them in a secure area outside of the web root. It's much easier to audit file permissions than it is to find injection vulnerabilities in a large amount of ASP/PHP code. As always, each situation is unique and has its own merits. From a dba/server admin perspective, I'd prefer it if the dev team stored their files on the file system and their data in the database, the way it was intended. Jannie -----Original Message-----From: Ido Rosen [mailto:ido () cs uchicago edu] Sent: 28 February 2004 20:55To: David Wall @ Yozons, Inc. Cc: webappsec () securityfocus com Subject: Re: Controlling access to pdf/doc files (db "better" than filesystem?) -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, 28 Feb 2004 11:13:21 -0800 "David Wall @ Yozons, Inc." <dwall () yozons com> wrote:that in SQL Server is that all data in SQL Server is split over ~8kpages. When you add a BLOB it needs to be split into 8k chunks. WhenyouBut filesystems also store data into pages, often much smaller than 8kchunk.I agree that storing files with their metadata for such a solution in a database is a better solution than storing files. It's also probably more secure, since the web developer is less likely to botch some permissions, security, or sanity checks and since most database systems already have some sanity checks built in. Your reasoning in that last sentence is a bit off, though: Database systems (such as MySQL, PgSQL, ThinkSQL, and MSSQL) all must use the filesystem, so their 8k chunks may not match, and the storage may be out of phase. This is just a result of overlaying one file storage paradigm over another, and shouldn't cause too much trouble speed-wise. By adding a layer on top of the filesystem, you do increase the likelihood of inefficiency.
Current thread:
- RE: Controlling access to pdf/doc files (db "better" than filesystem?) Jannie Hanekom (Feb 28)
- Re: Controlling access to pdf/doc files (db "better" than filesystem?) David Cameron (Mar 01)