Big arrays in PHP

This blog has moved to http://brian.moonspot.net/.

Update: Terry Chay has answered my question about why this is happening. In a nutshell, PHP is using 33,160 opcodes and 33,157 regsters for the verbose code. In comparison, the serialized array only uses 5 opcodes and 2 registers. He used something called VLD, which I had not heard of, to figure all this out.

So, at dealnews, we have a category tree. To make life easy, we dump it to an array in a file that we can include on any page. It has 420 entries. Expanded, one entry may look like:

$CATEGORIES[202]['id'] = "202";
$CATEGORIES[202]['name'] = "clothing & accessories";
$CATEGORIES[202]['parent'] = "0";
$CATEGORIES[202]['standalone'] = "";
$CATEGORIES[202]['description'] = "clothing";
$CATEGORIES[202]['precedence'] = "0";
$CATEGORIES[202]['preferred'] = "0";
$CATEGORIES[202]['searchable'] = "1";
$CATEGORIES[202]['product'] = "1";
$CATEGORIES[202]['aliased_id'] = "0";
$CATEGORIES[202]['path'] = "clothing & accessories";
$CATEGORIES[202]['url_safe_name'] = "clothing-accessories";
$CATEGORIES[202]['child_count'] = "6";
$CATEGORIES[202]['childlist'][0] = 202;
$CATEGORIES[202]['childlist'][1] = 2;
$CATEGORIES[202]['childlist'][2] = 275;
$CATEGORIES[202]['childlist'][3] = 4;
$CATEGORIES[202]['childlist'][4] = 481;
$CATEGORIES[202]['childlist'][5] = 446;
$CATEGORIES[202]['childlist'][6] = 454;
$CATEGORIES[202]['childlist'][7] = 436;
$CATEGORIES[202]['childlist'][8] = 205;
$CATEGORIES[202]['childlist'][9] = 227;
$CATEGORIES[202]['childlist'][10] = 203;
$CATEGORIES[202]['childlist'][11] = 280;
$CATEGORIES[202]['childlist'][12] = 204;
$CATEGORIES[202]['children'][2] = &$CATEGORIES[2];
$CATEGORIES[202]['children'][275] = &$CATEGORIES[275];
$CATEGORIES[202]['children'][4] = &$CATEGORIES[4];
$CATEGORIES[202]['children'][481] = &$CATEGORIES[481];
$CATEGORIES[202]['children'][446] = &$CATEGORIES[446];
$CATEGORIES[202]['children'][454] = &$CATEGORIES[454];
$CATEGORIES[202]['children'][436] = &$CATEGORIES[436];
$CATEGORIES[202]['children'][205] = &$CATEGORIES[205];
$CATEGORIES[202]['children'][227] = &$CATEGORIES[227];
$CATEGORIES[202]['children'][203] = &$CATEGORIES[203];
$CATEGORIES[202]['children'][280] = &$CATEGORIES[280];
$CATEGORIES[202]['children'][204] = &$CATEGORIES[204];

So, I was curious how efficient this was. I noticed that some code that was using this array was jumping in memory usage as soon as I ran the script. So, I devised a little piece of code:

<?phpecho "Memory used: ".number_format(memory_get_usage())." bytesnn";

include_once "./cat_code.php";

echo "type: ".gettype($CATEGORIES)."n";

echo "count: ".count($CATEGORIES)."nn";

echo "Memory used: ".number_format(memory_get_usage())." bytesnn";

?>

The output was very surprising:

Memory used: 41,772 bytes
type: array
count: 420

Memory used: 4,951,248 bytes

Um, whoa. 5MB of memory just for including this file. The file itself is just 326k. Needless to say, that is bad. We include that file quite liberally. I decided to see if other methods of storing that would be better. First I tried var_export format.

Memory used: 41,784 bytes
type: array
count: 420

Memory used: 1,212,076 bytes

Well, that is much better. But, it took some fiddling to get it right. var_export does not export reference notation. The children arrays were being fully expanded and not made references. Without the references, the code was using 8MB of memory. That was much worse. Also, this is not really all that readable like the raw code version. If we can’t read it, it may as well be serialized. So, I tried it serialized.

Memory used: 41,764 bytes
type: array
count: 420

Memory used: 907,668 bytes

That is by far the best result. FWIW, timing the code showed that the var_export format was fastest and serializing was slowest. However, it was just .04 seconds faster (.047 vs. .089) including the PHP start up time. I will take that for the memory savings and ease of creation.

I am going to pose a question about this on the internals list and see if this is expected behavior or if it is a shocker to them as well.

Advertisements

12 Responses to Big arrays in PHP

  1. Evert says:

    interesting! just saw the post on internals.. i’m wondering what the reply will be.. maybe its just because the entire parsed php-tree remains in memory for opcode caching and stuff..

    With serialize() this is not the case.. I bet the data just gets thrown out..

  2. doughboy says:

    Even still, the whole file is only 320k. That would be a lot of wasted memory IMO. And, if I include the file over and over, the memory usage does not grow. So, I think the opcode idea is not the problem.

  3. Daniel says:

    You should make one more test – with simple objects (they are treated like references so the structure should go). How does that affect startup time and memory consumption.

  4. Brian Moon’s Blog: Big arrays in PHP

  5. […] his latest blog entry, Brian Moon takes a look at using big arrays in PHP – how efficient it is and what can be done to […]

  6. […] don’t read php-internals anymore because I’m partial to getting work done, but there was an interesting question the dealmac developer posted. Basically dealmac, like my current employer, has a large array structure in a PHP file somewhere […]

  7. joel says:

    Which version of PHP are you using? I think you’ll find different results with PHP 5.2. If the array is static then you don’t really have to assign it by reference, since in PHP 5 all (non-object) variables are copy-on-write. So as long as you don’t change the value then a copy isn’t made.

    I also tried this out some time ago and couldn’t duplicate the results. I usually look at memory_get_peak_usage(true) rather than memory_get_usage(), since it doesn’t seem to be as accurate.

    My testing found no difference at all in the amount of memory used when including a manually generated array and a var_export() generated one. I also had a multi-dimensional array of about the same size.

  8. terry chay says:

    Well my tests were done in PHP 4.3.9 w/XDebug on which I’ve mentioned many times is currently my dev platform at work. I don’t know what Brian is using. I was wanting to try this in other environments (Zend Optimizer on, PHP5, etc) but it was like 3 AM when I did this.

    I’ll look at it later.

  9. doughboy says:

    brianm$ php -v
    PHP 5.2.0 (cli) (built: Feb 20 2007 17:14:30)
    Copyright (c) 1997-2006 The PHP Group
    Zend Engine v2.2.0, Copyright (c) 1998-2006 Zend Technologies

  10. Joel says:

    I should mention that I’m using APC, so that might make a difference as well.

    I’ll try a cli version and repost results then :)

  11. erdtek says:

    Running on XAMP with Apache 2.2.4 and PHP 5.2.1 I get the following result:

    Memory used: 61,688 bytesnn
    type: arrayn
    count: 13nn
    Memory used: 68,312 bytesnn

    Not much difference.

  12. Izrul says:

    Thanks for sharing this php scripts. Good stuff.

%d bloggers like this: