PHP Appalachia Corrections

October 15, 2008


Just got home finally from PHP Appalachia.  I enjoyed meeting all the great people.

I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL.  I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet.  Some claims I made and their corrections.

  1. I said our largest file is about 1.8 million lines.  WRONG.  Actually it is about 4.6 million.  I was correct however that it does finish importing and indexing in about 5 minutes.
  2. I claimed I LOAD DATA INFILE to MyISAM first and then “insert into … select from” into an InnoDB table for speed reasons.  WRONG.  In fact, I do that because I need to merge fields from the file sometimes into one field in the databaes.  I could not find a way to do that with LOAD DATA INFILE.  As to speed.  I can’t say either way as I have no solid data.  Sounds like a good test.  MyISAM probably still wins on a LOAD DATA INFILE into a blank, fresh table based on my experience.
  3. Total rows currently indexed is 7.2 million.  I did not make a claim, but I thought I would just mention that.  I wanted to include that, but did not have Internet.  (Damn you Hughes)

Deploying Scalable Websites with Memcached

October 3, 2008


I spoke at the MySQL Conference and Expo this year about the architecture we have here at  After my talk, Jimmy Guerrero of Sun/MySQL invited me to give a webinar on how dealnews uses memcached.  That is taking place next week, Thursday, October 09, 2008.  It is a free webinar.  We have used memcached in a variety of ways as we have grown. So, I will be talking about how dealnews used memcached in the past and present.

For more information, visit the MySQL web site.

strtotime() – The PHP, date swiss army knife

September 20, 2008


Man, what did I do before strtotime().  Oh, I know, I had a 482 line function to parse date formats and return timestamps.  And I still could not do really cool stuff.  Like tonight I needed to figure out when Thanksgiving was in the US.  I knew it was the 4th Thursday in November.  So, I started with some math stuff and checking what day of the week Nov. 1 would fall on.  All that was making my head hurt.  So, I just tried this for fun.

strtotime("thursday, november ".date("Y")." + 3 weeks")

That gives me Thanksgiving.  Awesome.  It is cool for other stuff too.  At its very basic, it can take a MySQL datetime field and turn it into a timestamp.  Very handy for date calculations.  It also understands RFC 2822 and ISO 8601 date formats.  These are common in HTTP headers and some XML documents like RSS and Atom feeds.  Also, PHP can output those two standard formats with the date() function.  So, this makes them a good standards compliant way to pass full, timezone specific dates around.

Caching and TTL behavior

July 3, 2008

So, I am working on MemProxy some.  Mainly, I am trying to implement more of the Cache-Control header’s many options.  The one that has me a bit perplexed s-maxage.  Particularly when combined with max-age.

s-maxage is the maximum time in seconds an item should remain in a shared cache.  So, if s-maxage is set by the application server, my proxy should keep it for that amount of time at the most.  Up until now, I have just been looking at max-age.  But, s-maxage is the proper one for a proxy to use if it is present.  I do not send the s-maxage through because this is a reverse proxy and, IMO, that is proper behavior for an application accelerating proxy.  However, I do send forward the max-age value that is set by the application servers.  If no max-age is set, I send a default as defined in the script.  Also, if no-cache or no-store is set, I send those and a max-age of 0.

My problem arises when max-age is less than s-maxage.  Up until now, I have sent a max-age back to the client that represents the time left for the cached item in my proxy’s cache.  So, if the app server sent back max-age=300 and a request comes in and the cache is found and the cache was created 100 seconds ago, I send max-age-200 back to the client.  But, I was only using max-age before.  Now, in cases where s-maxage is longer than max-age, I would come up with negative numbers.  That is not cool.  The easiest solution would be to always send the original max-age back to the client.  But, that seems kind of lame.

So, my question is, if you are using an application (HTTP or otherwise) accelerator, what would you expect?  If you application set a max-age of 300 would you always expect the end client to receive a max-age of 300?  Or should it count down over time?  The only experience I have is a CDN.  If you watch CDN traffic, the max-age gets smaller and smaller over time until it hits 0.  I have not tried sending an s-maxage to my CDN.  I don’t know what they would do with that.  Maybe that is a good test.

UPDATE: Writing this gave me an idea.  If the item will be in the proxy cache longer than the max-age ttl, send the full max-age ttl.  Otherwise, send the time left in the proxy cache.  Thoughts on that?

(thanks for being my teddy bear blogosphere)

Velocity Conference Roundup

July 1, 2008

As I said before, I was invited to be on a panel at Velocity Conference.  I was delighted to go.  I had never been to San Francisco.  I have been to Portland and Santa Clara several times.  The panel was great.  It was the Brian and photo sharing sites show.  Seriously, it was me (, John Allspaw of Flickr, Don MacAskill of SmugMug and Farhan Mashraqi of Fotolog.  Oh, there was also Shayan Zadeh of Zoosk, a social dating network and Michael Halligan, a consultant from BitPusher.  We all had similar ideas.  I told my Yahoo story.  I told everyone that they should denormalize (or optimize as Farhan prefered) their data to improve performance.  Others agreed.  I have written about my methods for denormalizing normalized data before.  (See pushed cache)  Fun was had by all.

I mentioned John Allspaw above.  He gave a talk on his own as well.  It was good.  The slides are on SlideShare.  He and I see eye to eye on a lot of things.  One thing he says in there that may shock a lot of people is to test using produciton.  I agree fully.  We could have never been sure our infastructure was ready last year without testing the production servers.

I also learned about Varnish at the conference. It is a super fast reverse proxy.  It uses the virtual memory systems of recent kernels to store its cache.  The OS worries about moving things from memory to disk based on usage.  The claim is that the OSes are better at this than any programmer could do (without copying them of course).  It is fast.  The developers are proud.  And by proud I mean cocky.  I have been playing with it.  As you know, I have my own little caching proxy solution.  Varnish is much faster, as I expected.  However, storing cache in memcached is very attractive to me.  Varnish can’t do that.  It would likely slow it down a great deal.  MemProxy does do that.  Also, because MemProxy is written in PHP and my application layer is PHP, I can do things at the proxy layer to inspect the request and take action.  Works well for my use.  But, if you are using squid or mod_cache or something, you may want to give Varnish a look.

There was a good bit of information about the client side of performance.  There were folks from Microsoft there talking about IE8.  It looks like IE8 will catch up with the other browsers in a lot of ways.  Yahoo talked about image optimization.  Good stuff in there.  I use Fireworks and it does a pretty good job of making small images.  I am looking more into combining images and making image maps that use CSS.  We use a CDN, but fewer connections is better for users.

There was also a lot of great debate.  SANs rock!  SANs suck!  Rails Scales!  Rails Sucks!  The Cloud is awesome!  The Cloud is a lie!  (lots of cloud)

I had dinner both nights with guys from Six Apart.  Good conversations were had.  I don’t know if I am a big vegan fan though.  I mean, the food was good, but it all kinda tasted the same.  Perhaps I ordered poorly.  At dinner on Tuesday I met a guy going to work for Twitter soon.  He is an engineer that hopefully will be another step toward getting them back to 100% again.  Lets keep our fingers crossed.

They did announce that the conference would be held again next year.  I am definitely going back.  Probably two of us from dealnews will go.  OSCON is fun.  MySQL conference is too.  But, more and more, capacity planning and scaling is what I do.  And this conference is all about those topics.

Did you know I am going to be at Velocity?

June 18, 2008

Well, neither did I until today. HA!

Velocity is a new O’Reilly conference dedicated to “Optimizing Web Performance and Scalability”.  It starts next Monday.  Yesterday I was contacted by Adam Jacobs of HJK Solutions about taking part in a panel discussion about what happens when success comes suddenly to a web site.  I think he thought I was in the bay area.  Little did he know I am in Alabama.  But, amazingly, I was able to work it all out so I can be there.  I wish I had known about this conference ahead of time.  It sounds really awesome.  Performance has always been something I focus on.  I hope to share some and learn at the same time.

So, if you are going to be there, come see our panel.

P.S. Thanks to John Allspaw of Flickr for recommending me to Adam.

An Introduction to MySQL – Birmingham, AL

June 17, 2008

I am giving a talk titled “An Introduction to MySQL” here in Birmingham, AL on June 21, 2008 at 3PM.

I love living in Alabama.  I was born and raised in Huntsville.  However, Birmingham has always seemed a bit behind in technology compared to what I do for a living.  There is good reason.  The industry here is medical, banking, industrial and utilities.  I don’t really want my doctors keeping my medical records in an alpha release of anything.  Same goes for my banking and utilities.  But, as this page shows, the companies here are catching up.  So, I am happy to present MySQL to as many people as I can in this town.  Hopefully I will help some folks that have not been exposed to MySQL or any open source for that matter.

The event is part of our local Linux user group’s (BALU) planned events.