PHP Appalachia Corrections

October 15, 2008


Just got home finally from PHP Appalachia.  I enjoyed meeting all the great people.

I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL.  I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet.  Some claims I made and their corrections.

  1. I said our largest file is about 1.8 million lines.  WRONG.  Actually it is about 4.6 million.  I was correct however that it does finish importing and indexing in about 5 minutes.
  2. I claimed I LOAD DATA INFILE to MyISAM first and then “insert into … select from” into an InnoDB table for speed reasons.  WRONG.  In fact, I do that because I need to merge fields from the file sometimes into one field in the databaes.  I could not find a way to do that with LOAD DATA INFILE.  As to speed.  I can’t say either way as I have no solid data.  Sounds like a good test.  MyISAM probably still wins on a LOAD DATA INFILE into a blank, fresh table based on my experience.
  3. Total rows currently indexed is 7.2 million.  I did not make a claim, but I thought I would just mention that.  I wanted to include that, but did not have Internet.  (Damn you Hughes)

Deploying Scalable Websites with Memcached

October 3, 2008


I spoke at the MySQL Conference and Expo this year about the architecture we have here at  After my talk, Jimmy Guerrero of Sun/MySQL invited me to give a webinar on how dealnews uses memcached.  That is taking place next week, Thursday, October 09, 2008.  It is a free webinar.  We have used memcached in a variety of ways as we have grown. So, I will be talking about how dealnews used memcached in the past and present.

For more information, visit the MySQL web site.

strtotime() – The PHP, date swiss army knife

September 20, 2008


Man, what did I do before strtotime().  Oh, I know, I had a 482 line function to parse date formats and return timestamps.  And I still could not do really cool stuff.  Like tonight I needed to figure out when Thanksgiving was in the US.  I knew it was the 4th Thursday in November.  So, I started with some math stuff and checking what day of the week Nov. 1 would fall on.  All that was making my head hurt.  So, I just tried this for fun.

strtotime("thursday, november ".date("Y")." + 3 weeks")

That gives me Thanksgiving.  Awesome.  It is cool for other stuff too.  At its very basic, it can take a MySQL datetime field and turn it into a timestamp.  Very handy for date calculations.  It also understands RFC 2822 and ISO 8601 date formats.  These are common in HTTP headers and some XML documents like RSS and Atom feeds.  Also, PHP can output those two standard formats with the date() function.  So, this makes them a good standards compliant way to pass full, timezone specific dates around.

Google Chrome and privacy

September 2, 2008

So, Google Chrome is out. If you don’t know, it’s Google’s new browser. I downloaded it on my Windows XP machine and tried it out. I found this curious thing in the options.

Google Chrome Spying on you?

Google Chrome Spying on you?

So, I thought, I will click “Learn more” to see what they are watching. I get this.

Uh OH!  404!

Uh OH! 404!

So, I unchecked the box. Let’s hope the premature launch is the reason there is no more information out there.

UPDATE: The page comes up now and says:

Information that’s sent to Google includes crash reports and statistics on how often you use Google Chrome features. When you choose to accept a suggested query or URL in the address bar, the text you typed and the corresponding suggestion is sent to Google. Google Chrome doesn’t send other personal information, such as name, email address, or Google Account information.

So, if you use their suggestions, they know it.  And it tracks what features you use.  Hmm, I think I will disable.

Shrinking ibdata files after innodb_file_per_table

August 22, 2008

Patrick Galbraith wrote in his blog about switching to innodb_file_per_table.  For those that don’t know about this setting, it places the data for the tables into .ibd files within the database dir instead of storing it in the ibdata files in the main datadir.  This is useful if you don’t want to babysit your innodb tablespace.  At least, that was my main reason for wanting to use it.  There is still dictionary data stored in the ibdata file(s) so you can’t just remove them.

Anyhow, the delima Patrick wrote about is recovering the space used by the ibdata files in the datadir after you have converted the tables to one file per table.  I commented on his blog, but thought it worth a full post to be sure others could find my solution.

  1. Backup your data (cuz, you never know)
  2. Convert all tables to MyISAM
  3. Stop MySQL.
  4. Delete ib* in the datadir
  5. Restart MySQL.  MySQL will recreate the files.
  6. Convert all tables to InnoDB

It worked for me.  Your mileage may vary.  No warranty that you won’t lose all your data.  Try it on a dev server first.

The benefits of this were that the data was online while we converted the tables.  The only downtime was while we shut down MySQL, removed the files and waited on MySQL to recreate a small 10MB ibdata file and the ib_logfiles.  I am not 100% sure you have to remove the ib_logfiles, but I did for good measure.  I just run with a single 10MB autoextend ibdata file.  I think it is at 34MB or so on our main database server now.

dealnews is hiring a Systems Administrator

August 20, 2008

We are hiring!  dealnews is looking for a full time systems administrator.  The developers have been sharing the sys admin load for over 10 years now.  But, we really need a dedicated person now.  If you are interested, see our jobs page.

Replication is much better than cold backups

August 17, 2008

So, I wrote about the begining of our wild database issues. Since then, I have been fighting a cold, coaching little league football and trying to help out in getting our backup solutions working in top shape.  That does not leave much time for blogging.

Never again will we have ONLY a cold backup of anything.  We were moving nightly full database dumps and hourly backups of critical tables over to that box all day long.  Well, when the filesystem fails on both the primary database server and your cold backup server, you question everything.  A day after my marathon drive to fix the backup server and get it up and running, the backup mysql server died again with RAID errors.  I guess that was the problem all along.  In the end, we had to have a whole new RAID subsystem in our backup database server.  So, my coworker headed over to the data center to pull the all nighter to get the original, main database server up and running.  The filesystem was completely shot.  ReiserFS failed us miserably.  It is no longer to be used at dealnews.

Well, today at 6:12PM, the main database server stops responding again.  ARGH!!  Input/Ouput errors.  That means RAID based on last weeks experience.  We reboot it.  It reports memory or battery errors on the RAID card.  So, I call Dell.  Our warranty on these servers includes 4 hour, onsite service.  They are important.  While on the phone with Dell, I run the Dell diagnostic tool on the box.  During the diagnostic test, the box shuts down.  Luckily, the Dell service tech had heard enough.  He orders a whole new RAID subsystem for this one as well.

There is one cool thing about the PERC4 (aka, LSI Megaraid) RAID cards in these boxes.  They write the RAID configuration to the drives as well as on the card.  So, when a new blank RAID card is installed, it finds the RAID config on the drives and boots the box up.  Neato.  I am sure all the latest cards do it.  It was just nice to see it work.

So, box came up, but this time we had Innodb corruption.  XFS did a fine job in keeping the filesystem in tact.  So, we had to go from backups.  But, this time we had a live replicated database that we could just dump and restore.  We should have had it all along, but in the past (i.e. before widespread Innodb) we were gun shy about replication.  We had large MyISAM tables that would constantly get corrupted on the master or slave and would halt replication on a weekly basis.  It was just not worth the hassle.  But, we have used it for over a year now in our front end database servers with an all Innodb data set.  As of now, only two tables in our main database are not Innodb.  And I am trying to drop the need for a Full-Text index on those right now.

So, here is to hoping our database problems are behind us.  We have replaced almost everything in one except the chassis.  The other has had all internal parts but a motherboard.  Kudos to Dell’s service.  The tech was done with the repair in under 4 hours.  Glad to have that service.  I recommend it to anyone that needs it.

Database nightmare

August 5, 2008

Its 12:30AM (00:30 for you Euros).  I am watching The Daily Show on Tivo.  All is well.  Then the phone beeps.

MySQL Main is critical

SSH?  no.

Digi console?  no.

About a week ago, we had a mysqldump file that was corrupt.  We cleaned it up.  My worst fears came to my mind.  We tried power cycling it.  It did not come back.

While my coworker was dealing with the facilily people, I worked on the backup server.  Had to ensure the last full backup was in place and apply the incremental data.  Suddenly, my SSH connection dies.  OMG.  THAT DUMB A** GUY POWER CYCLED THE WRONG BOX!!! — FS corrupted.  Damn you ReiserFS!

By now, it’s 4AM.  Tech took an hour to get to the rack.  It is 20 feet from his cubicle.  I get in the car.  I am two hours away from the facility.  No sleep.  It is still dark.  I play loud music.  I talk to myself.  I curse the guy that power cycled the wrong box. The sun comes up and it is easier to drive.

I sit here, waiting on the OS to finish installing so I can restore the backup and incremental data again.  Hours of content lost.  The content team is hand writing HTML that other developers are rsyncing around to the servers.

The good news?  All the work done in 2007 to separate our front end and backend worked.  The front end works fine (99%).  Just no new content.  Well, except for the hand done HTML.

Note to self: Get that main database replication working again.  ASAP.

Where Drizzle fits in for me

July 24, 2008

So, most of you have heard about Drizzle by now.  For those that have not, you can check out many, many blog posts or the Launchpad page.

The thread on Slashdot about Drizzle was quite negative.  Most misunderstand what Drizzle is about.  SQLite is not a good solution when you have 100 web servers.  Let me describe how it I would use it and maybe that will help some understand it.

When it comes to MySQL use, dealnews has two very different use cases.  The first is an enterprise storage system that involves content creation, reporting and data warehousing.  For that layer of our business, we are using more and more advanced features as they become available.  We use triggers and stored procedures.  We use complex data types for specific use cases.  All those features are a big gain.

The other way that we use MySQL is for serving content to our readers.  I have written about this before.  For this purpose, we avoid joins, don’t use any advanced features.  We do use replication, indexes and intelligent queries.  We don’t (as one slashdot reader claimed) do all of our processing in the code.  That would be stupid.  If you do that you are ignorant.  I will stop talking about that before this becomes a rant.  I do believe in letting MySQL do my work for me.

This is where Drizzle fits in.  To serve content, I don’t need stored procedures, triggers, views or any of that other stuff.  The whole database that the front end web servers use is basically a view.  It is a denormalized, prepared version of the real data.  I store objects. But, I have to be able to sort and filter the data in a way that SQL allows me to do.  CouchDB sounds interesting.  Maybe one day it will be there.  It is sill in the optimization phase.

Now, some say that this is just MySQL 3.x all over again.  Well, you clearly have not been listening to the really smart people that are working on Drizzle.  They are doing more than just removing the 4.1 and 5.x features from MySQL.  They are removing things that don’t make sense for this use case.  They are adding things that do make sense.  They are replacing parts of the code base where there is a better library or way of doing it.  At this point, they have no feature requirements to meet.  They have no deadlines.  They are making what they think the high volume web world and/or cloud computing needs.  They are making it plugable:  think Apache modules or PHP extensions.  So, if you need feature XYZ that was yanked out, you can add it back in (hopefully) via the internal API.  There is a lot more going on here than just removing “features”.

So, I am cheering on the folks working on Drizzle.  I have joined their community and will provide what feedback I can from userland.  I am no C++ coder.  I can read it.  I can debug it.  But, writing it or doing heavy lifting is not in my skill set.  Hopefully I can contribute

Usability FAIL

July 22, 2008

I can’t be at OSCON this year.  But my colleague Rob is and he just posted a usability post about, of all things, the Double Tree hotel where I am sure a lot of you are staying.  Great stuff.