Database nightmare

August 5, 2008

Its 12:30AM (00:30 for you Euros).  I am watching The Daily Show on Tivo.  All is well.  Then the phone beeps.

MySQL Main is critical

SSH?  no.

Digi console?  no.

About a week ago, we had a mysqldump file that was corrupt.  We cleaned it up.  My worst fears came to my mind.  We tried power cycling it.  It did not come back.

While my coworker was dealing with the facilily people, I worked on the backup server.  Had to ensure the last full backup was in place and apply the incremental data.  Suddenly, my SSH connection dies.  OMG.  THAT DUMB A** GUY POWER CYCLED THE WRONG BOX!!! — FS corrupted.  Damn you ReiserFS!

By now, it’s 4AM.  Tech took an hour to get to the rack.  It is 20 feet from his cubicle.  I get in the car.  I am two hours away from the facility.  No sleep.  It is still dark.  I play loud music.  I talk to myself.  I curse the guy that power cycled the wrong box. The sun comes up and it is easier to drive.

I sit here, waiting on the OS to finish installing so I can restore the backup and incremental data again.  Hours of content lost.  The content team is hand writing HTML that other developers are rsyncing around to the servers.

The good news?  All the work done in 2007 to separate our front end and backend worked.  The front end works fine (99%).  Just no new content.  Well, except for the hand done HTML.

Note to self: Get that main database replication working again.  ASAP.


Where Drizzle fits in for me

July 24, 2008

So, most of you have heard about Drizzle by now.  For those that have not, you can check out many, many blog posts or the Launchpad page.

The thread on Slashdot about Drizzle was quite negative.  Most misunderstand what Drizzle is about.  SQLite is not a good solution when you have 100 web servers.  Let me describe how it I would use it and maybe that will help some understand it.

When it comes to MySQL use, dealnews has two very different use cases.  The first is an enterprise storage system that involves content creation, reporting and data warehousing.  For that layer of our business, we are using more and more advanced features as they become available.  We use triggers and stored procedures.  We use complex data types for specific use cases.  All those features are a big gain.

The other way that we use MySQL is for serving content to our readers.  I have written about this before.  For this purpose, we avoid joins, don’t use any advanced features.  We do use replication, indexes and intelligent queries.  We don’t (as one slashdot reader claimed) do all of our processing in the code.  That would be stupid.  If you do that you are ignorant.  I will stop talking about that before this becomes a rant.  I do believe in letting MySQL do my work for me.

This is where Drizzle fits in.  To serve content, I don’t need stored procedures, triggers, views or any of that other stuff.  The whole database that the front end web servers use is basically a view.  It is a denormalized, prepared version of the real data.  I store objects. But, I have to be able to sort and filter the data in a way that SQL allows me to do.  CouchDB sounds interesting.  Maybe one day it will be there.  It is sill in the optimization phase.

Now, some say that this is just MySQL 3.x all over again.  Well, you clearly have not been listening to the really smart people that are working on Drizzle.  They are doing more than just removing the 4.1 and 5.x features from MySQL.  They are removing things that don’t make sense for this use case.  They are adding things that do make sense.  They are replacing parts of the code base where there is a better library or way of doing it.  At this point, they have no feature requirements to meet.  They have no deadlines.  They are making what they think the high volume web world and/or cloud computing needs.  They are making it plugable:  think Apache modules or PHP extensions.  So, if you need feature XYZ that was yanked out, you can add it back in (hopefully) via the internal API.  There is a lot more going on here than just removing “features”.

So, I am cheering on the folks working on Drizzle.  I have joined their community and will provide what feedback I can from userland.  I am no C++ coder.  I can read it.  I can debug it.  But, writing it or doing heavy lifting is not in my skill set.  Hopefully I can contribute

Usability FAIL

July 22, 2008

I can’t be at OSCON this year.  But my colleague Rob is and he just posted a usability post about, of all things, the Double Tree hotel where I am sure a lot of you are staying.  Great stuff.

Spreading Open Source (sort of)

July 12, 2008

So, I know I just had a kid, but I am at a friends house helping with some computer issues.  This is the friend that took my wife to the hospital and sat with her until I got there, after all.  She is also the friend that took my other five kids in while we were at the hospital.  So, I owe her big time.

First a little backstory.  A few years ago, I started installing Firefox and Thunderbird on my non-technical friend’s computers.  I would label the Firefox icon “Internet” and the Thunderbird icon “Email”.  This made it simple for them.  I would also install OpenOffice on those machines that did not have the full Microsoft Office package and show them that it could do all the same things that they needed MS Office for.

Anyhow, I am helping this friend by installing XP Service Pack 3 and remove some malware that somehow got on here. While waiting on Windows, I notice that my usual pattern of installing FF, TB and are all done on this machine.  What is cool is that I have never used this computer before.  This one is new to me.  So, that means this friend sought out Firefox, Thunderbird and OpenOffice all on her own and installed them for her family the same way I always have.

Now, I am not naive enough to think that my friend suddenly understands Open Source.  She is not using it because she wants to be a part of the open source movement.  But, it does make me feel good to help spread open source if even from the user perspective.  It is also a testament to those applications and how far they have come.

Hudson is born!

July 10, 2008

Well, it was not as planned, but he is here.  Our 6th child, Hudson Bennett Moon came into the world this morning at 8:53AM.  All of our children have been born via C section.  The plan was to come in at around 10AM on the 16th to deliver Hudson the same way.  Well, yesterday my wife starting having some pains that the doctors did not like.  They watched her overnight and decided this morning to go ahead and perform the C section this morning.  It was a whirlwind.  We waited all night to see what was going to happen.  Then, at 8:25AM, the nurse came in and said that we were doing it at 8:45 and that my wife was to be wheeled in to the OR in 5 minutes.  We franticlly called friends and family.  None of them were there.  We assumed (you know how that goes) that we would have an hour or something.  No such luck.  So, by 8:53 he was here and at 9:40 we were all in a room with baby and family.  Everyone is healthy.  Mom feels better than she did the last 3 weeks.

Hudson in blue

Click for a slideshow

Caching and TTL behavior

July 3, 2008

So, I am working on MemProxy some.  Mainly, I am trying to implement more of the Cache-Control header’s many options.  The one that has me a bit perplexed s-maxage.  Particularly when combined with max-age.

s-maxage is the maximum time in seconds an item should remain in a shared cache.  So, if s-maxage is set by the application server, my proxy should keep it for that amount of time at the most.  Up until now, I have just been looking at max-age.  But, s-maxage is the proper one for a proxy to use if it is present.  I do not send the s-maxage through because this is a reverse proxy and, IMO, that is proper behavior for an application accelerating proxy.  However, I do send forward the max-age value that is set by the application servers.  If no max-age is set, I send a default as defined in the script.  Also, if no-cache or no-store is set, I send those and a max-age of 0.

My problem arises when max-age is less than s-maxage.  Up until now, I have sent a max-age back to the client that represents the time left for the cached item in my proxy’s cache.  So, if the app server sent back max-age=300 and a request comes in and the cache is found and the cache was created 100 seconds ago, I send max-age-200 back to the client.  But, I was only using max-age before.  Now, in cases where s-maxage is longer than max-age, I would come up with negative numbers.  That is not cool.  The easiest solution would be to always send the original max-age back to the client.  But, that seems kind of lame.

So, my question is, if you are using an application (HTTP or otherwise) accelerator, what would you expect?  If you application set a max-age of 300 would you always expect the end client to receive a max-age of 300?  Or should it count down over time?  The only experience I have is a CDN.  If you watch CDN traffic, the max-age gets smaller and smaller over time until it hits 0.  I have not tried sending an s-maxage to my CDN.  I don’t know what they would do with that.  Maybe that is a good test.

UPDATE: Writing this gave me an idea.  If the item will be in the proxy cache longer than the max-age ttl, send the full max-age ttl.  Otherwise, send the time left in the proxy cache.  Thoughts on that?

(thanks for being my teddy bear blogosphere)

Velocity Conference Roundup

July 1, 2008

As I said before, I was invited to be on a panel at Velocity Conference.  I was delighted to go.  I had never been to San Francisco.  I have been to Portland and Santa Clara several times.  The panel was great.  It was the Brian and photo sharing sites show.  Seriously, it was me (, John Allspaw of Flickr, Don MacAskill of SmugMug and Farhan Mashraqi of Fotolog.  Oh, there was also Shayan Zadeh of Zoosk, a social dating network and Michael Halligan, a consultant from BitPusher.  We all had similar ideas.  I told my Yahoo story.  I told everyone that they should denormalize (or optimize as Farhan prefered) their data to improve performance.  Others agreed.  I have written about my methods for denormalizing normalized data before.  (See pushed cache)  Fun was had by all.

I mentioned John Allspaw above.  He gave a talk on his own as well.  It was good.  The slides are on SlideShare.  He and I see eye to eye on a lot of things.  One thing he says in there that may shock a lot of people is to test using produciton.  I agree fully.  We could have never been sure our infastructure was ready last year without testing the production servers.

I also learned about Varnish at the conference. It is a super fast reverse proxy.  It uses the virtual memory systems of recent kernels to store its cache.  The OS worries about moving things from memory to disk based on usage.  The claim is that the OSes are better at this than any programmer could do (without copying them of course).  It is fast.  The developers are proud.  And by proud I mean cocky.  I have been playing with it.  As you know, I have my own little caching proxy solution.  Varnish is much faster, as I expected.  However, storing cache in memcached is very attractive to me.  Varnish can’t do that.  It would likely slow it down a great deal.  MemProxy does do that.  Also, because MemProxy is written in PHP and my application layer is PHP, I can do things at the proxy layer to inspect the request and take action.  Works well for my use.  But, if you are using squid or mod_cache or something, you may want to give Varnish a look.

There was a good bit of information about the client side of performance.  There were folks from Microsoft there talking about IE8.  It looks like IE8 will catch up with the other browsers in a lot of ways.  Yahoo talked about image optimization.  Good stuff in there.  I use Fireworks and it does a pretty good job of making small images.  I am looking more into combining images and making image maps that use CSS.  We use a CDN, but fewer connections is better for users.

There was also a lot of great debate.  SANs rock!  SANs suck!  Rails Scales!  Rails Sucks!  The Cloud is awesome!  The Cloud is a lie!  (lots of cloud)

I had dinner both nights with guys from Six Apart.  Good conversations were had.  I don’t know if I am a big vegan fan though.  I mean, the food was good, but it all kinda tasted the same.  Perhaps I ordered poorly.  At dinner on Tuesday I met a guy going to work for Twitter soon.  He is an engineer that hopefully will be another step toward getting them back to 100% again.  Lets keep our fingers crossed.

They did announce that the conference would be held again next year.  I am definitely going back.  Probably two of us from dealnews will go.  OSCON is fun.  MySQL conference is too.  But, more and more, capacity planning and scaling is what I do.  And this conference is all about those topics.