Out with cluster, hello replication

Well, I have written a good bit about MySQL Cluster this year.  We had been using it as a sort of pregenerated cache for our forward facing web servers.  However, we have decided on a different route.

Why the change

With normal MySQL, configuration can make big performance differences.  With cluster, it can make the cluster shut down. We woke up one morning to errors about REDO logs being overloaded.  It had been overloaded for about 8 hours.  We had made some changes the day before, but they all worked fine on our test cluster.  So, we shut down the processes that were new and even shut off all other processes that were loading data into the servers.  4 hours later, the simplest insert would still complain about the REDO logs.  The only thing that cleared it up was a full rolling restart of the storage nodes.  That took 5 hours.  Luckily, we were still operating with a single server as a backup that had the same data stored in InnoDB tables.  We switched to it before we started the rolling restart. So, after all that was done, we really became worried about the ups and downs of cluster.  We started thinking that maybe cluster was not ready for us and the use we had in mind.

Enter Replication

We had tried replication years ago with not a lot of success.  We found we had to baby sit it weekly as queries would fail on the slave that did not fail on the master or indexes would get corrupted on the slave.  Just annoying things that would take up time.  But, that was MySQL 4.0 and we were trying to replicate our main database.  Faced with massive down time with cluster, an annoyance every few weeks seems like an acceptable risk.

The new setup

So, with the new  structure, we have two masters and two slaves.  We had already written code that would synchronously write data to two (or more) MySQL servers using transactions to ensure the data was written to all the hosts.  Its what was used to write the data to the cluster and InnoDB servers already.  So, now, it  just writes data to the two masters using InnoDB and then the two slaves read from them.  So far, there have been no outages whatsoever.

Scaling: Cluster vs. Replication

One of the big things we liked about MySQL Cluster was that it scaled very well.  As connecitons increased, query speed and response time did not degrade at the same rate as connections.  InnoDB on the other hand linearly degrades.  Basically, an InnoDB server with 128 concurent connections returns data twice as slow as it does with 64 connections.  Cluster on the other hand would only see about a 10% reduction in performance.  But, with InnoDB and replication, we can more quickly add new nodes and scale out in an emergency.  Once the hardware is in place, we could have new nodes online in about 30 minutes.  If we hit a wall with cluster, it would take us a day or two to bring the server online after the hardware was in place.  Adding storage nodes would mean a full dump and reload of the database.  So, the solution we have now will not scale as well as cluster with the same equipment, but it will be easier to scale out once we need to.


6 Responses to Out with cluster, hello replication

  1. Paul M says:

    Sounds like you are already on the path to database sharding with multiple independent master-slave groups.
    If you can put up with the extra work, sticking something like MySQL proxy in front of those groups might relieve some of the application pain in coding which SQL goes where.

    If you are at all worried about availability, adding another additional slave of each slave, even to one other box located remotely might make the difference in a site power outage event.

    Have Fun


  2. Xaprb says:

    I’ve written several tools to make replication easier to manage (especially verifying integrity and repairing out-of-sync slaves): http://mysqltoolkit.sourceforge.net/

    I also wrote some stuff about making replication reliable at http://www.xaprb.com/blog/2007/01/20/how-to-make-mysql-replication-reliable/

    Most of it is still true, but some of it I’ve learned more about ;-)

  3. doughboy says:

    We have a very nice F5 BIG-IP load balancer for dealing with select queries. We did some testing with MySQL Proxy and it seemed to die at about 1024 concurrent connections. I was not planning on using it in production so I did not dig any further. We are not currently doing any sharding. its just two masters with the same data.

  4. Erik says:

    I have been looking at various setups as well, but it seems clustering isnt that mature yet . I am looking now at database sharding with master and read slaves where the masters are also held available with heartbeat+DRBD setups.

  5. doughboy says:

    Luckily, our F5 BIG-IP takes care of health monitoring for us. They are not cheap, but the solve a lot of problems for us in that area.

  6. […] other way that we use MySQL is for serving content to our readers.  I have written about this before.  For this purpose, we avoid joins, don’t use any advanced features.  We do use […]

%d bloggers like this: