HTML Purifier and Phorum
There have been several posts about HTML Purifier 2.0 lately. I did not look to closely at it until I saw this post on our Phorum support forum. Seems the creator of HTML Purifier has chosen Phorum for his site. I hope that means it met his standards for HTML and security. He has posted some questions about the Phorum core. We always welcome a fresh mind.
He is writing a module for Phorum to allow straight HTML in Phorum posts. We have an HTML module already, but its quite basic compared to what you can do with his library. Several people have wanted to use the WYSIWYG text editors that are out there. This should/could open that up to people. I don’t see the Phorum core ever having one, but that is what modules are for.
Caching and patience
I am subscribed to the memcached list. Memcached use (and interest in caching in general) has exploded it seems in the last couple of years. There are a lot of people that join the list that not only don’t have a lot of experience with memcached, but they don’t have experience with caching. We have been caching at dealnews since the beginning. We started using memcached at dealnews in early 2005.
One common concern that new users have is data not being updated the second it goes in the database. This is how it worked before they were caching. They are used to it working that way. So, when they start caching, they miss that instant gratification. We went through this at dealnews. Our content team could write a deal, go to the front page and see it right then. They could then move on with their lives. As we grew and it became apparent that we could do that anymore, we had to make changes. Does the front page really have to be updated the second we write a deal? We discovered the answer was no.
We use three primary techniques to keep our cache as up to date as it needs to be.
Sane TTLs
With tools like memcached, you can provide a TTL (time to live) for each cached item. This ensures that a particular piece of cache will not longer be used after a given time. We cache data for our front page for 2 minutes. It does not sound like a lot I know. But, we have a 84% hit rate on that cache. So, the data is never really that old, but the cache does a wonderful job. For other content that hardly changes, we use a ttl of a day or even an hour. You have to decide per object for your application if this is the right thing to do for you. TTLs are best, IMO, because you know what to expect. A surge in traffic can not force the cache to expire more quickly.
Forcibly updated
If you can’t use a TTL, removing, or better yet updating the cache via code is another option. If we have objects that need to be updated, we will usually update the cache rather than simply expiring it. For us, we usually have a function that will return an object from cache, but if its not there, it will make the queries and create the cache. The function will generally have a force option that will recreate the cache for the item even if the cache is found. We gave a talk at Apachecon and wrote a paper that covered this topic in 2001 (see Caching in the Real World on that page). The basics in that paper still hold true for caching today. WARNING! There are a couple actually. If your data is updated constantly and you are doing this on every single insert/update to your database, you are wasting your time. You have to use your cache wisely. Ask yourself, “Does this data have to be real time?” The second warning is that when you come under high load, one expiring item on your page can cause thousands of queries to be run. We experienced a little of this when Yahoo linked us.
Pushed Cache
We have been using this method for a while in our ad serving software. We are now using it more and more. IMO, its the most sure fire way to handle increased load. Basically, you don’t have the pages of your web site make SQL requests to the live SQL data in the event no cache is found. That is what I call a pulled cache. Instead, you push the data from your primary database into some caching (or even another, optimized SQL server) for your web site to use. We are actually using MySQL Cluster for this purpose on our web site. The forward facing web site hits only the MySQL Cluster. If the data is not there, its just not there. We have processes on our backend that gather data from our primary database, assemble it for presentation and populate the cluster. The queries that the web site uses to access the cluster are highly optimized. You could do the same with memcached, but memcached is volatile. With cluster, we have high availability and get about the same performance as we did with a fully cached paged.
Exceptions
Of course, there are exceptions. Forums are a good example. For them to be useful, its kind of hard to cache a lot of stuff. With Phorum, we do cache things like user profiles, forum settings and other slow changing items. But, caching messages for any amount of time usually has a low ROI. They update so fast and if users don’t see updates they lose interest.
MySQL Cluster SQL Tips
So, I mentioned in my MySQL Cluster post that I found out that cluster and joins don’t get a long too well. There are a couple of tips I have for using joins or replacing them with other techniques that I thought I would share.
Here are some things I keep in mind when using the NDB engine. These may apply to the other engines as well.
- Test using the NDB engine. Testing against another engine is not adequate.
- Avoid JOIN if you can, test often if you can’t
- Avoid subqueries always
- Select using primary keys when at all possible. Always select using a key.
- Be sure to include both your where and order by columns in your keys.
Try your join from several directions.
Sometimes, the way you join can affect the engine. This is more apparent with NDB than MyISAM or InnoDB. Have a look at these two queries:
select pub_articles.article_id
from pub_articles
inner join article_edition_lookup on
pub_articles.article_id=article_edition_lookup.article_id and
pub_articles.publication_id=article_edition_lookup.publication_id and
publication_date=’2007-06-07′
where pub_articles.publication_id=2;
select article_id
from article_edition_lookup
inner join pub_articles using (article_id, publication_id)
where publication_id=2 and publication_date=’2007-06-07′;
They look similar. Same tables. Same basic criteria. In fact the explain output looks nearly identical. But, the first takes 20 seconds on cluster and the second takes .02 seconds. Now, some MySQL internals person may be able to look at this and know why, but to me, I just have to test them to see. (If anyone cares, the primary key on pub_articles is publication_id, article_id and the PK on article_edition_lookup is publication_id, publication_date, article_id).
Try using temporary tables.
If you can’t get the join to be fast no matter what, try temporary tables. I have had some success using the HEAP engine to store data and then joining the two tables together. I don’t have an example of that from any code I have currently written. I have mostly just used it at the mysql prompt to get data. Here is an example that uses the same tables from above and retrieves the same data using a temporary table.
create temporary table if not exists foo engine=heap
select article_id from article_edition_lookup where publication_id=2 and publication_date=’2007-06-08′;
select article_id
from pub_articles use key (primary)
inner join foo using (article_id)
where publication_id=2 order by first_publish_time desc;
Together that all happens in about .06 seconds. You need to realize that this does create a table, in memory, on the API node. So, this data will only be good for this one connection. If your temporary table has lots of rows, you may need to add a key to it when you create it. I am not sure I would use this method for a query that is going to be run a lot. Chances are there is a better solution if you have to resort to this technique.
Subqueries
I have not found a way to make subqueries fast. Even simple queries like the following are slow.
select article_id from pub_articles where publication_id=2 and article_id in (select article_id from article_edition_lookup where publication_id=2 and publication_date='2007-06-07' );
That is a primary key lookup on both of those tables. Yet the query takes 30 seconds to run. Just stay away from them altogether. For what it’s worth, the same query takes 2 seconds on the InnoDB engine. I don’t think subqueries are optimized in 5.0. This may be a known issue. I got used to not having them for so long that I never think to use them now.
Unions
I have not used UNION much with cluster. When I have it has worked well. I would use caution when using it. Be sure to test the queries before putting them in a production environment.
RAID is dying?
There is a bunch of posts on Planet MySQL this week about RAID. This comment from Kevin Burton really kind of made me go “huh?”.
You’re thinking too low level. Who cares if the disk fails. The entire shard is setup for high availability. Each server is redundant with 1-2 other boxes (depends on the number of replicas). If you have automated master promotion you’ll never notice any downtime. All the disks can fail in the server and a slave will be promoted to a new master.
Monitoring then catches that you have a failed server and you have operations repair it and put it back into production as a new slave.
Someone has to think low level. The key phrase in there is you have operations repair it and put it back into production as a new slave. This tells me all I need to know. Kevin later states that his company does in fact not operate their own equipment, but uses a provider for all their hosting.
At this point, I think this is a philosophy argument and not a real world application argument at this point. Sure, I guess if I am Google or Yahoo I can do this. But, for the mass majority of web sites running out there, having 4 data centers and “operations” at your beck and call is not a reality. For real people, having a server go down is pain in the ass. Why should I want to spend a full day of labor rebuilding a server because a $200 part broke or just got corrupted. It takes 10 minutes to start a rebuild and maybe another 10 minutes to install a new drive if the rebuild fails.
His other argument is about performance. Sure, its debatable whether RAID is faster or slower. It probably depends on the application. If your RAID is a bottle neck for your application, then you need to address that. For us, its far from the bottleneck so why bother with the downtime of having one (of our 30, not 1000) servers down.
BTW, would you rather admin 30 servers or 1000? I think 30.
I should add that we only use RAID on servers that are used for data storage. Losing data sucks. For web servers we don’t use RAID. They do fit the model that Kevin describes. We have a lot of them. If one goes down, its ok. Maybe Kevin’s application can fit all its data on one web node. Don’t know. I just know its right for us and I don’t see a future where I won’t want it on our servers. We are even using RAID in our MySQL Cluster servers. Why? Because I don’t want to have to wait a day to get a storage node back up and running for a $200 part.
Five months with MySQL Cluster
So, the whole world changed at dealnews when Yahoo! linked us. We realized that our current infrastructure was not scaling very well. We had to make a change.
The Problem
Even though we were using all sorts of cool techniques, the server architecture was really still just a bunch of web servers all serving the same content. In addition to that, our existing systems as the time used a pull method. When a request came in, memcache was checked, if the data was not there, it was fetched from our main MySQL server. So, when there is no data in the cache or when it expires, this was very bad. Like when Yahoo! hit us. Some cache item would expire and 60,000 users would hit a page and each page would try and create the cache item.
The Solution
I was tasked with two things. Find a way to handle something like the Yahoo burst and finding a way to store the data we need to generate our web pages that was highly available and would scale. For bursting, I wrote a proxy using apache, mod_rewrite, php and memcached. I have reasons I did it this way that are not relevent to this post. Maybe more on that later.
For the data solution, I considered several things: MySQL replication, writing my own replicating memcached client, and other exotic ideas. One of the semi-exotic ideas for us was MySQL Cluster. We had not used it at all. Some things about it made us gun shy. But, we tested it and were very happy with the results.
Initial Test
With the help of Gentoo, getting a cluster up and running was really, really easy. In fact, it seemed too easy. We ran a cluster on some dev boxes at first. We did some generic testing using the PHPTestSuite from the guys at MySQL Performance Blog. What we found was that while the cluster appeared slower at low concurrent connections, it scaled much better than InnoDB (our prefered storage engine) when the concurrent connections grew.
Application Testing
So, we moved to the next step, testing our application. We discovered early on with cluster that we would have to redesign our application. Our DB was highly relational. Almost no data could be put on the site without data from other tables. We used a lot of joins. We learned (later) that joins in the cluster are not a good idea. Neither are sub-selects. So, we wrote some proof of concept scripts for our application. We were very happy. Very few issues were found. Nothing anywhere near show stopping.
Installation
We ordered our servers. Six new Dell dual-core, dual processor Opterons with a lot of memory. Two would become SQL nodes and the other four would be storage nodes. Our data set is not that large compared to a lot of companies. So, we configured the cluster with 4 replicas. Our main goal is high availability and scalability. I could find nothing in my tests or in the manual that indicated this would be bad for scalability and it should be great for HA.
We rewrote our application (basically, our public web site) to use the new cluster and its new table design. We hit our first snag when we tried to seed the data in the cluster. We got errors from the cluster about its transaction logs not being big enough to handle the inserts. Through the manual, forum posts, the mailing list archives and some blogs I was able to find the correct settings for our needs. I remembered back when I first installed the cluster thinking it was too easy. I now realize that getting a cluster running is easy. Making it run well, is a whole other story.
The second snag was with joins. Our test bed for the cluster was not a cluster. We used a group of servers using InnoDB to test against. That was a mistake. Joins did not work at all with the cluster. We had to back up, rewrite some code and redo some tables. In the end, the design is probably faster on InnoDB or cluster.
Everyday Use
We started using the cluster for every day use about a month ago. I guess 5 months is not bad for starting from nothing to live in production. We have been slowly moving applications to it. We take care each time to monitor the cluster and see that its not throwing new errors. So far, so good. We have about 80% of our page views (40% of our page views are our front page) and about 50% of our end user applications using the cluster now. We are doing caching at the proxy level for a lot of this. But, when tested, the new architecture is much more reliable even without the caching proxies. Some things like our forums will never translate to the cluster. But, they have their own dedicated systems already and are non-critical for our business. They could be shut down if there was a problem with them.
Administration
MySQL Cluster is a whole new animal. Its not like monitoring mysqld, apache or other stuff we already use. It took me a while to get the hang of rolling restarts, brining nodes up and down after crashes, etc. We have had just one crashed node since we switched over to production use. The cluster stayed up and kept serving content. We have written a Nagios monitor to keep track of the nodes’ status. It uses ndb_mgm and reports any problems to us.
Feedback
Now, as the title says, I have only been using MySQL Cluster for 5 months. If you are reading this and have more experience and are thinking “What a moron!”, please tell me. We are still learning.
Update:
Ronald Bradford had some questions on his blog for me. I figured I would just answer them here.
You didn’t mention any specific sizes for data, I’d be interested to know, particularly growth and how you will manage that?
We currently have a DataMemory of 4GB and IndexMemory of 2GB. Based on the crude methods we have to monitor it, I think we are at about 40% capacity. We are using MySQL Cluster purely as a data store for content on our web site. So, we can trim the data store down significantly. If it does not appear on the site, its not in cluster.
You also didn’t mention anything about Disk? MySQL Cluster may be an in-memory database but it does a lot of disk work, and having appropriate disk is important. People overlook that.
Yes, we have U320 15k SCSI drives. We do use RAID 1 on our servers contrary to some opinions. We see a lot of drive failures. About one every 4 months. Sucks to lose a whole machine just because a $200 drive failed.
You didn’t mention anything about timings? Like how does backups for example compare now to previously.
Well, we don’t currently back up the cluster data as it is being copied from our main database already. Maybe that is a mistake, I don’t know. But, I can’t come up with a reason to backup data that is just a copy of another database server. Also, I have written a PHP class that does parallel writing to multiple servers using transactions. Everything we write to the cluster also gets written to an “oh shit” mysql server that users InnoDB. So, in the event we have a total cluster failure, F5 BIG-IP load balancers will send mysql traffic to the InnoDB server.
You didn’t mention version? 5.1 (not GA) is significant improvement in memory utilization due to true varchar support, saving a lot of memory, but as I said not yet production software.
Yeah, I am drooling over 5.1. But, we are using current Gentoo stable, 5.0.38 I believe. 5.1 looks superior in many many ways. I can’t wait to upgrade.