.comment-link {margin-left:.6em;}

Oracle Sponge -- Now Moved To Wordpress

Please use http://oraclesponge.wordpress.com

Monday, July 25, 2005

Time Slicing of Disk I/O

It is a well known phenomenon that when a single process makes multiple multiblock read requests of physically contiguous data from a disk the performance far exceeds that available when multiple requests are made for different data concurrently. Right? I leverage this on my FSK development machine, where a serial full table scan can be arranged to exceed 40MB/sec on a single fast (10,000rpm SATA) disk. Incidentally it is not only fast but it's very quiet also due to the lack of excessive and inefficient head movement.

Now I don't propose to write a long essay on the mechanical details of why this is so, and I think that it suffices to say that time spent moving the drive heads from one part of the disk to another is time wasted.

So consider how this affects a data warehouse, where large scans of contiguous sets of data are two-a-penny. When you have multiple queries accessing the same disks, some layers inbetween the database may perform some optimisation of the access pattern by rescheduling disparate requests for contiguous data into a single request, but they are considering only very small time periods when doing so. There are presumably two objectives here -- to optimize and speed up disk access, but also to avoid delaying each request by more than some small amount of time (measured in milliseconds or tens of milliseconds, perhaps).

Let us suppose that we have a request for a 100Mb set of data to be read from disk, in chunks of 512kb. If the data could be contiguous on the disk(s) then this data set could be read with no latency due to unnecessary head movement. Just grabbing some raw numbers from http://www.storagereview.com/articles/200411/20041116ST3146754LW_2.html this could be accomplished in around 1.2 seconds (at a transfer rate averaged by eyeball).

However if we introduce another similar query, concurrent with this one, of the same size, but of different data, then the heads start flitting around like flies in a jam jar -- the transfer of the 200Mb now required takes far in excess of 2x1.2 seconds because there are now something like 400 head movements to include. Let us say 3.7msec per head movement, and we've added on around 1.5 seconds ... so each query now has to wait for 3.9 seconds for the complete return of its result set. How much better would it be if one query's disk access was deferred until the others was completed, so that the preferred query's data was returned in 1.2 seconds and the delayed query's data was returned in 2.4 seconds? That average time of 1.8 seconds seems to me to be far preferable to the average of 3.9 seconds, and even the disadvantaged query benefits.

Well, the Aldridge Theory Of theEvils Of Concurrent I/O Access is only partly thought out so far, and there is probably some much more sophisticated processing required:
  • temporarily deferring access for a 10Gb read to allow a couple of 100Mb reads to get their feet in the door
  • completing the scanning of indexes before starting any table access, and possibly storing indexes on different devices
  • a heavy reliance upon the physical clustering of rows likely to be requested for the same query
... but I wonder, is there any really fatal flaw in such a scheme?


At 12:59 AM, Blogger Tim... said...

Why not make the whole database single threaded so you don't compete with any other processes :)

If you're after reducing head movements you could try using only the outer edge of your disks:


Or wear a neck brace ;)



At 7:06 AM, Blogger David Aldridge said...

Heh, that may be going too far even for my tastes, but it's essentially what I'm thinking about at the i/o level -- the cost of the "thread switch" is too high when it requires milliseconds of head movement each time.

Now another issue is where you are reading and writing to the same disks, say "insert into my_table select * from my_table" .... a lot of head movement there, causing a delay in the task completion.

Yes, I like the outer-edge of disk practice, and use it on my foreign script kiddy machine. On my little two-disk RAID0 array I get about 95MB/sec at the outer edge(s) and 83MB/sec at the inner.

At 5:58 PM, Blogger Noons said...

Remember the Oracle benchmarks on Sequent? Those invariably had hundreds of disk drives with just the outer third used, for the redo logs and any write intensive t/s.

Effective? Heck, yeah!
Practical? How deep is your pocket?


Post a Comment

Links to this post:

Create a Link

<< Home