Down Time and Data Loss
The truth about hard drives and managing the risk.
Monitoring and More About RAID
What are the disadvantages of RAID technology?
1. Cost. The cost is considerable. A quality server, properly specified, can cost 3 - 5 times as much as a desktop machine used "as" a server. In our opinion, as you would expect, if you value your data, and require your systems to be up all the time, this cost is amply justifiable.
2. Performance. As the data has to be shared between the drives, this requires additional processing power. If your RAID system is not properly specified and configured, this can lead to an enormous performance hit, such that write speeds can be a fraction of the speed of writing to a single drive, which in turn will cause poor performance for your users. Our servers do not suffer this problem.
RAID on the cheap
Some suppliers have a grasp of the risk associated with hard drive failure, and offer some form of RAID to reassure the customer. Sadly, not all RAID systems are created equal, as we shall see.
A note about RAID1 (aka mirroring) and onboard RAID
Some cheap motherboards in generic PCs can be configured to mirror 2 of the drives. This is not RAID5 with an additional hot spare, but simply a live copy of the data from one drive to the other. The idea, of course, is that in the event of hard drive failure, the system will still boot up on the functional drive, and the faulty drive can be replaced. Sounds like a good idea? Well, they are better than no RAID at all, but we have worked with and tested these systems extensively, and have found 3 major problems with them:
1. Corruption. If the hard drive fails suddenly, the mirror works as expected, the faulty drive is failed, and the system can continue to operate. However, hard drives do not always fail suddenly - sometimes they fail gradually, causing file system corruption or gradual data loss. A basic mirror setup on cheap systems can simply copy this corruption to the other drive, resulting in 2 degraded hard drives. RAID5 does not suffer this problem, as it does not use mirroring to protect the data, but parity - a topic beyond the scope of this document.
2. Downtime. Because the drives cannot be swapped live, the system has to be shut down to do the repair, which does not apply to hot-swappable RAID5 systems. In addition, it is not always straightforward to identify which of the drives it is. As these systems typically have the drives in caddies inside the machine, and there is no external signal that the drive is failed, it takes some time to identify and swap out the correct drive.
3. Performance. We find that, as these systems do not have dedicated RAID controllers with onboard cache memory, they rely on the host system's resources to do the extra work, and the read and write performance can be very poor indeed, creating a bottleneck on the network and dreadful performance for the user. Many IT suppliers do not bother testing their systems' suitability for purpose, but create a system that offers false reassurance to the customer, while extending their profit margins.
The importance of monitoring and maintenance
It is vital with any system, and especially with a system with RAID, that the server is monitored, locally or remotely. If a hard drive in a RAID does fail, it needs to be replaced promptly. If the drives are not replaced promptly, you will suffer performance degradation and eventually more hard drives may fail. Net Therapy monitors all our customer networks remotely across broadband. We know within minutes if any of our customers' servers have a hard drive failure, or indeed if any other hardware on the system fails.
