Server Storage
When configuring the main hard drives for your
data capture server there are many different options to consider.
The first and most crucial consideration if fault tolerance. As it is
impractical in most cases to run full backups of TeleForm servers (TeleForm
backup FAQ) it is essential that the most common causes of data loss
are planned for.
Modern hard drives rotate at anything from 5,400RPM for desktop ATA
(otherwise known as IDE) hard drives up to 15,000RPM for server SCSI hard
drives. This means that in every hard drive there are several metal platters
spinning at incredible speeds 24 hours a day, 7 days a week for years on end
with little interruption. To put this into perspective, imagine placing your
foot to the floor on the accelerator pedal of any modern car in neutral and
holding it there for three years. This will cause the engine to rotate at
approximately 6,000RPM, which is similar to many low end hard drives. The
question is not so much “if” it will break, more “when” it will break. Hard
drives are exactly the same and although manufacturers have engineered the
drives to last reliably for the few years they are typically in service for,
they are still the one component in a server most likely to fail.
When a drive breaks it usually suffers complete failure within minutes or
seconds which means that all data stored on the drive is lost. Even if you
have an up to date backup it will typically take hours or even days to
repair the server, re-install the operating system and restore the backup.
For this reason, we consider a fault tolerant configuration of hard drives
an essential part of any server.
Fault Tolerance?
It is possible to configure hard drives in a server (or even a workstation)
so that if a drive fails the system is able to continue operation
unhindered. This is commonly referred to as a RAID configuration.
RAID 1
The most basic type of RAID is RAID 1, which means two hard drives of equal
capacity are mirrored so that each drive contains exactly the same
information. In a RAID 1 configuration if either drive fails the other one
will continue to work, allowing the server to keep running. The drawback to
RAID 1 is that you have to buy two drives at double the cost and loose 50%
of the total storage capacity.
When a drive fails you simply replace it as soon as you can source another
drive and with most modern hot swap systems you don’t even need to re-start
the server. The server will then re-create the mirror on the new drive and
the system is fault tolerant once more.
RAID 5
The much more complex RAID 5 requires three or more drives. It works by
splitting the data across all but one of the drives and then writing parity
information to the last drive. If for example you have three drives, it will
write the data to drives one and two and then write the parity information
to drive three. The parity information is written to a different drive each
time, so the next set of data will be split across drives two and three with
the parity information sent to drive one.
The parity information is basically the key to calculating a missing part of
data. For example with 1 + 3 = 4 it is possible to calculate any of the
numbers if only one is missing. If you consider each number (or chunk of
data) to be stored on one of the three drives in a RAID 5 array the server
can rebuild the data if any of the three drives fails.
| Number of Disks | Data | Missing Data | ||||||||||||||
| Three disk RAID 5 |
|
3 | ||||||||||||||
| Six disk RAID 5 |
|
7 |
One main advantage with RAID 5 is that you loose a much smaller amount of
space to the fault tolerance. RAID 1 looses 50% where as RAID 5 only looses
up to 33% or less depending on the number of drives. Also you can create
RAID 5 arrays of much larger capacity than would be possible with one drive
by simply adding more and more drives to the array.
The drawback to RAID 5 is that there is an overhead for the server to
calculate the parity information every time data is written to the drives.
Due to the way the data is split, there is also a performance loss,
especially for large files, when reading the data back from the array.
RAID 1 V RAID 5
A few years ago the largest drives available where only 36GB in size. RAID 5
was ideal for creating large volumes as you could use four 36GB drives to
create a 108GB RAID 5 array. Now that 500GB drives are commonplace RAID 5 is
much less useful. Two 500GB drives in a RAID 1 array would be cheaper,
simpler to manage and faster than three 250GB drives in a RAID 5 array.
Pushing the Performance Boundaries with RAID 0
In order to increase performance, it is possible to use RAID 0 which
requires two disks and splits the data across them. Half the data is written
to disk one, whilst simultaneously the second half of the data is being
written to disk two. This is also commonly referred to as striping. The
theoretical speed is double that of the component drives and although the
increase in speed might not reach the theoretical maximum it is still a
significant performance boost.
To test this we performed some tests with some 10,000RPM Western Digital Raptor SATA drives and 8GB of data consisting of varying file sizes to simulate real world usage:
| RAID 1 MB/Sec | RAID 0 MB/Sec | Speed Increase |
| 40.8 | 64.3 | 58% |
Whilst RAID 0 doesn’t waste any space, as with RAID 1 and 5, its huge
drawback is that it doesn’t provide any fault tolerance and it doubles the
chance of fault occurring! It is spreading your eggs in to two baskets but
if either are dropped you loose the whole lot! Whilst this might be
acceptable for a high performance workstation it is less than acceptable in
a server.
RAID 10 to the Rescue
By coupling RAID 1 and RAID 0 you can create a volume that is fault tolerant
AND faster then its component drives, this is called RAID 10 (or RAID 1+0).
To do this you need four or more drives and you loose 50% of your storage
capacity as with RAID 1. You start by creating two RAID 1 mirrored volumes
and then use RAID 0 to stripe them. This provides the optimum level of
fault tolerance and speed.
Glossary
ATA - Advanced Technology Adaptor. The most common type of hard
drive interface, commonly used today in workstations and entry level
servers.
IDE - Integrated Drive Electronics. Another name for the ATA
interface.
SCSI - Small Computer Systems Interface. A very high
performance interface usually used in high end servers.
RAID - Redundant Array of Inexpensive Disks. This provides a
way of grouping two or more disks in varying configurations to provide
tolerance should any one of the drives fail.
Hot Swap - The ability to remove and replace a component while
the server is still switched on.
SATA -
Serial ATA. A new, higher performance, hot swappable drive
interface now common on workstations and low end servers.
home / FAQ / Hard Drives


