Tuesday, October 27, 2015

PHP array de-duplication trick that'll save you lots of CPU (* YMMV)

Always code as if the person who ends up maintaining your code is a violent psychopath who knows where you live.

Today, I discovered this curious gem in legacy code:

<?php
// ✂ ...snip... wall of code that creates array $ids
$ids = array_flip($ids);
$ids = array_flip($ids);
// ✂ ...snip... wall of code using $ids

Looks like a cut and paste error, and my gut reaction was to delete the second line. But, a sneaking suspicion stalled my reaction: the developer-that-no-longer-works-here who wrote that code had a talent for writing clever, uncommented code. Perhaps this was another instance of that pattern, and I'd better check myself.

PHP arrays map keys to values. array_flip spins the mapping around, values to keys. If the code intended to flip the array, then I'd expect to see iteration over values to keys. What I observed in later logic was instead iteration of keys to values, as if the flip never happened. So what was this code doing?

It's a de-duplication trick, first seen in a 2002 comment about array_unique. Purportedly, double flip is significantly faster than the equivalent array_unique call for large arrays. Before you rush off and change all your code to use this trick, keep these things in mind:

  • array_unique() and array_flip(array_flip()) produce different results where keys are concerned: array_unique keeps the first unique (key, value) pair while array_flip keeps the last.
  • array_unique([0, false, 0]) produces the expected result. Double flip does not (and raises a warning to boot).
  • For small arrays, the performance difference is invisible.
  • For very large arrays of numbers, the difference between array_unique($a, SORT_NUMERIC) and array_flip(array_flip()) is negligible. On a medium Amazon EC2 instance, 0.27s vs 0.4s for 10M integers.
  • For very large arrays of strings, the difference is significant. On a medium Amazon EC2 instance, 1.2s vs 10.9s for 10M strings of random length between 3 and 5 ASCII characters.

That's a real savings, and so this trick definitely has a place in the developer's tool box. But please, please for the love of all that is holy, comment the trick so that's clear what's going on. My preferred way of seeing this trick deployed is:

$ids = array_flip(array_flip($ids)); // want unique values, don't care about keys

Wednesday, October 14, 2015

404

I'm terribly sorry, but I can't find the page you have requested.

Believe me, I looked for it. Really! You can't even imagine the millions of calculations I had to go thru to look in all the places my owner could have left what you asked for. But I couldn't find it. It's not there! :-(

No, it's not your fault, it's all my own. I'm a bad server. I know. Terrible. I should get a new job one day or another, but what else can I do?

Tuesday, October 13, 2015