Which is Faster: SAN or Directly-Attached Storage?
Should I place my database files on SAN or directly-attached storage? This is a frequently asked question. It comes up repeatedly in public newsgroups, email discussion lists, and private meetings with customers who are concerned with database performance.
What SAN?
A Storage Area Network (SAN) may mean two different things. To the storage professionals, it could narrowly mean the switched fabric between the host servers and the sophisticated disk arrays that actually store data. To most other folks, however, the distinction between the switched fabric and the disk arrays means very little. Rather, they view the fabric and the disk arrays -- practically everything behind the drive and beyond the host server -- as comprising the SAN. This article assumes this most common view.
To simplify discussions in the rest of this article, a drive from a directly-attached storage will be referred to as a DAS drive, and a drive presented from a SAN as a SAN drive.
What is meant by "faster"?
To address the question of which is faster, you must be clear about what you mean by "faster". In general, you can use three key metrics to quantify the performance of a disk I/O path:
- I/Os per second (IOps)
- Megabytes per second (MBps)
- Latency
IOps measures how many I/O requests can be satisfied by the disk I/O path in a second. For a given disk I/O path, this metric is generally in reverse proportion to the size of the I/O requests. That is, the larger the I/O requests, the lower the IOps. This is intuitive. After all, it takes more time to process a 256KB I/O request than it does an 8KB request.
MBps measures how much data can be pumped through the disk I/O path. If you view the I/O path as a pipeline, MBps measures how big the pipeline is and how much pressure it can sustain. So, the bigger the pipeline is and the more pressure it can handle, the more megabytes of data it can push through. For a given I/O path, MBps is in direct proportion to the size of the I/O requests. That is, the larger the I/O requests, the higher the MBps. Larger I/Os give you better throughput because they incur less disk seek time penalty than smaller I/Os.
Note that for I/O requests of a fixed block size, MBps is simply IOps times the block size in megabytes.
I/O latency--also known as I/O response time--measures how fast an I/O request can be processed by the disk I/O subsystem. For a given I/O path, it is in reserve proportion to the size of the I/O request. As mentioned previously, a larger I/O request takes longer to complete.
Generally speaking, you should pay more attention to IOps and I/O latency when it comes to small-sized I/O requests, and be more concerned with MBps when dealing with large-sized I/O requests. But whether you report IOps or MBps, you should always keep I/O latency in the picture. Note that as you push a disk I/O subsystem to sustain higher IOps or MBps, the average I/O latency will continue to go up. After a certain point, the latency will increase exponentially as you put even more I/O requests on the system. If you obtain a good IOps or MBps number at the expense of a very high I/O latency, that IOps or MBps number may not be very useful in practice.
What does all this have to do with the question of which is faster, SAN or directly-attached storage? Well, these three metrics define precisely what is meant by "faster". So when you say one is faster than the other, you have to be explicit about what measure you are referring to.
Let us examine the question along these three performance measures, and let's start with the I/O latency measure first.
Disk I/O latency
SANs almost always come out a loser when it competes with DAS on I/O latency under light load. If you don't believe me, you can measure it yourself, and you don't need any fancy tool to obtain convincing data points.
To measure the latency of an I/O path under the best possible condition, you should issue single-threaded I/O requests, i.e. synchronous I/Os. If you are using an I/O benchmark tool such as IOMeter or sqlio.exe, for instance, you can configure a single worker thread to issue small sequential writes with a queue depth of one.
Alternatively, you can use a simple SQL Server script to measure the latency of an I/O path. The following T-SQL script is an example that can help you ascertain the best possible disk I/O latency (or response time) when committing a SQL Server database transaction. The script assumes that D is a directly-attached internal drive and E is a drive presented to the same server from a SAN.
-- create a small database on the local internal drive D
CREATE DATABASE io_test_D ON PRIMARY
(NAME=N'io_test_d_data', FILENAME=N'd:\io_test_d.mdf', SIZE=255)
LOG ON
(NAME=N'io_test_d_log', FILENAME=N'd:\io_test_d.ldf', SIZE=2000);
go
ALTER DATABASE io_test_D SET RECOVERY SIMPLE;
-- create a small database on the SAN drive E
CREATE DATABASE io_test_E ON PRIMARY
(NAME=N'io_test_e_data', FILENAME=N'e:\io_test_e.mdf', SIZE=255)
LOG ON
(NAME=N'io_test_e_log', FILENAME=N'e:\io_test_e.ldf', SIZE=2000);
go
ALTER DATABASE io_test_E SET RECOVERY SIMPLE;
go
-- Create two identical tables, one in each database
use io_test_D
go
CREATE TABLE test (i CHAR(200) NOT NULL);
INSERT test VALUES('ABC');
go
use io_test_E
go
CREATE TABLE test (i CHAR(200) NOT NULL);
INSERT test VALUES('ABC');
go
-- Run the following script on database io_test_D and io_test_E.
-- Run the script multiple times in each database, and alternate
-- the test runs to get consistent measures
SET NOCOUNT ON;
DECLARE @i int,
@start datetime;
SET @i = 1;
SET @start = getdate();
WHILE @i < 10
BEGIN
UPDATE test
SET i = cast(@i as CHAR(200));
SET @i = @i + 1;
END
SELECT 'Duration' = datediff(ms, @start, getdate()),
'Latency' = datediff(ms, @start, getdate())/10000.0;
What you will find is that the best I/O response time on your D drive is in the 100-microsecond range, whereas the best I/O response time on the E drive may be in the 300~500-microsecond range and could be higher.
This is intuitive because a typical I/O path between the host and the SAN media (or the SAN cache) is significantly longer than a typical I/O path between the host and a directly attached storage media (or its cache). In other words, an I/O request would have to travel through more components before it can be considered hardened on a SAN drive than it would on a directly attached storage. No matter how small a latency each of these components may incur, they add up.
IOps and MBps
IOps is really a measure of the multi-tasking capability of the disk subsystem, whereas MBps is a measure of how big the disk I/O pipeline is. SANs -- especially higher end SANs -- shine on these measures, although from a specific host server you may only have access to a fraction of what the SAN has to offer.
Higher end SANs are often deployed through a Fibre-Channel switch based fabric. It offers point-to-point links through the fabric, and a host is often configured to have multiple links through the fabric. Other components on the I/O path can be similarly configured to offer I/O path parallelism. This enables higher end SANs to scale their ability to process I/O requests and to scale the size of the I/O pipeline.
MBps, on the other hand, may leave much to be desired on a drive presented to a specific host, especially in a large shared environment where one has to distribute the I/O resources among as many hosts as possible while still deliver adequate performance for each host. If you view a SAN drive as consisting of one or more paths of many I/O components, it is then a question of whether enough I/O path parallelism is configured for the drive. There can be a host of reasons why the MBps throughput of a particular SAN drive is limited.
Configuring DAS, on the other hand, is a lot simpler as you most likely have full control of all the key decision factors: the choice of disk controllers (i.e. the maximum throughput of each controller and the number of controllers), the choice of drives (i.e. the maximum throughput of each drive and its size), and how many drives to hang off each controller.
For instance, if you need 1000MBps throughput, you can plug in four 300MBps SATA RAID controllers to your host server, and hook up enough SATA drives to each controller.
Two simple examples
So how does a SAN drive stack up against a DAS drive? Well, I hope the discussions so far--though scratching only the surface of the topic--have helped to convince you that there is no absolute winner, and it all depends on how the drives are configured. In general and given sufficient resources, you can configure a SAN drive to outperform a DAS drive in terms of throughput. Similarly, with sufficient resources, you can configure a DAS drive to outperform a SAN drive.
If that's not convincing enough, let me give you two simple examples.
Example 1 - SAN beating DAS
I had a server with a SAN drive and an internal local drive which was a mirrored set of two drives. To assess their performance, I ran an identical series of disk I/O benchmark tests on both the SAN drive and the DAS drive, respectively. The following table summarizes some of the results:
SAN Drive
DAS Drive
IOps MBps IOps MBps
8K sequential writes 3951 31 313 2
8K sequential reads 9542 75 552 4
128K sequential writes 1240 155 149 19
128K sequential reads 1656 207 488 61
The SAN drive far outperformed the DAS drive on all four I/O workloads. In fact, on the I/O throughput, I'd expect any decently configured SAN drive to beat an internal mirrored set--that you typically find on a server--by a large margin.
Example 2 - DAS beating SAN
It is an easy job to show that you can configure a DAS drive to beat a SAN drive. Take the SAN drive in the previous example. The best throughput it could achieve was around 207MB per second. Now, take a look at the DAS drives that came with the Sun Fire X4500 server as described in one of Jim Gray's articles. With six SATA controllers and 48 7K-rpm drives, you could get about 2.6GB per second, more than 10 times better in throughput than the SAN drive in the previous example could offer in throughput.
- imran's blog
- Login or register to post comments



