RAID is dying?

There is a bunch of posts on Planet MySQL this week about RAID.  This comment from Kevin Burton really kind of made me go “huh?”.

You’re thinking too low level. Who cares if the disk fails. The entire shard is setup for high availability. Each server is redundant with 1-2 other boxes (depends on the number of replicas). If you have automated master promotion you’ll never notice any downtime. All the disks can fail in the server and a slave will be promoted to a new master.

Monitoring then catches that you have a failed server and you have operations repair it and put it back into production as a new slave.

Someone has to think low level.  The key phrase in there is  you have operations repair it and put it back into production as a new slave.  This tells me all I need to know.  Kevin later states that his company does in fact not operate their own equipment, but uses a provider for all their hosting.

At this point, I think this is a philosophy argument and not a real world application argument at this point.  Sure, I guess if I am Google or Yahoo I can do this.  But, for the mass majority of web sites running out there, having 4 data centers and “operations” at your beck and call is not a reality.  For real people, having a server go down is pain in the ass. Why should I want to spend a full day of labor rebuilding a server because a $200 part broke or  just got corrupted.  It takes 10 minutes to start a rebuild and maybe another 10 minutes to install a new drive if the rebuild fails.

His other argument is about performance.  Sure, its debatable whether RAID is faster or slower.  It probably depends on the application.  If your RAID is a bottle neck for your application, then you need to address that. For us, its far from the bottleneck so why bother with the downtime of having one (of our 30, not 1000) servers down.

BTW, would you rather admin 30 servers or 1000?  I think 30.

I should add that we only use RAID on servers that are used for data storage.  Losing data sucks.  For web servers we don’t use RAID.  They do fit the model that Kevin describes.  We have a lot of them.  If one goes down, its ok.  Maybe Kevin’s application can fit all its data on one web node.  Don’t know.  I just know its right for us and I don’t see a future where I won’t want it on our servers.   We are even using RAID in our MySQL Cluster servers. Why?  Because I don’t want to have to wait a day to get a storage node back up and running for a $200 part.


2 Responses to RAID is dying?

  1. […] Brian Moon responded to the exchange also puzzled by the assertion that “RAID is dying” in RAID is dying? […]

  2. john allspaw says:

    Thanks for posting this, Brian. Scaling web ops doesn’t just mean machines, it also means scaling people and time.

%d bloggers like this: