Optimizing strings in PHP ?

Submitted by Frederic Marand on

Every so often, I get asked about whether it is really worth it to chase double quotes and constructs like print "foo $bar baz", and replace them with something like echo 'foo', $bar, 'baz', or even to remove all those big heredoc strings so convenient for large texts.

Of course, most of the time, spending hours to fine comb code in search of this will result in less of a speedup than rethinking just one SQL query, but the answer is still that, yes, in the infinitesimal scale, there is something to be gained. Even with string reformatting ? Yes, even that. But only if you are not using an optimizer.

Just don't take my word for it, Sara Golemon explained it years ago with her "How long is a piece of string" post, in 2006.

Note that her suggestion about apc.optimization is no longer relevant with APC 3.x, though, as this has been removed in APC 3.0.13, with optimizations now always on.

2011-02 UPDATE: as explained by TerryE in the comments, while this applied with older PHP versions (remember, this was written in 2006) which one still found live when reviewing older sites for upgrades, it does not apply to current versions of PHP 5.2.x and 5.3.x.

2010-04 UPDATE: the initial version of this post incorrectly mentioned phpdocs instead of heredocs. Thx dalin for pointing out the error. phpdoc-type comments DO come at a cost but that's when using the Reflection API, which actually interprets them to add information to ReflectionParameter instances, and this has nothing to do with string processing.

trackback link

I think you are confusing PHPdoc and heredoc. PHPdoc
<?php
/**
 * Converts baz into bar.
 *
 * @param $baz
 *  The baz widget.
 * @return
 *  A newly minted bar.
 */
function foo {

}
?>
And heredoc:
<?php
$foo
= <<<EOD;
 
This is a long
multi
-line string
 that might contain a $variable
EOD
;
?>
There is no cost to PHP comments. And as she mentions at the end of the article, if you are running an opcode cache that does optimizations (which you should be), like modern versions of APC, then there's no real difference between the different string styles. This is why there is no coder rule for string style. Choosing one style over another for the case of readability in that particular situation is far more important than the pico second that you will be saving.

"Just don't take my word for it, Sara Golemon explained it years ago with her..."

Sarah wrote this is 2006. Have you check this claim with a reasonable current PHP version? I have with both 5.2 and 5.3. These observations simply no longer apply, so to repeat these claims misleads new PHP programmers trying to improve their style. For example, she gives a case which she quotes as generating 76 opcodes, but if you run it through VLD on a current version (in my case 5.3.3) this generates:

$ vld /tmp/zzz.php     # (I've wrapped the I/O)
filename:       /tmp/zzz.php
function name:  (null)
number of ops:  2
compiled vars:  none
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
  12     0  >   ECHO   '%0A%2B-----------------------------------------------------------%2B%0A%7C++++
+++++++++++++++++++%21+ERROR+%21+++++++++++++++++++++++++++%7C%0A%7C+The+test-suite+requires+that+you+
have+pcre+extension++++++%7C%0A%7C+enabled.+To+enable+this+extension+either+compile+your+PHP+%7C%0A%7C
+with+--with-pcre-regex+or+if+you%27ve+compiled+pcre+as+a++++%7C%0A%7C+shared+module+load+it+via+php.i
ni.++++++++++++++++++++++++%7C%0A%2B-----------------------------------------------------------%2B%0A'
  14     1    > RETURN

that is one opcode. These artefacts are a result of PHP code generation foibles that no longer occur. Using double quotes with embedded variables or heredoc syntax is now handled by current PHP code generators sensibly and has no material performance penalty.