Console encoding in PHP-GTK apps

Submitted by Frederic Marand on

The problem: while coding PHP-GTK apps, the most elementary debugging method is to use echo or print statements. These are fine and dandy for english or usonian coders, but may be a problem for coders elsewhere around the globe, since PHP scripts are typically stored under UTF-8 encoding to limit i18n headaches, while the console in which their output will be displayed is normally configured to some regional encoding, like IBM850 in Windows/XP French.

So we need a workaround...

Builtin tools

PHP is usually built some form of iconv extension, which knows how to convert from UTF-8 (or whatever is used in your script), to IBM850 (or whatever is used in your console).

So a first step goes like this:

<?php
/**
 * This example supposes script is saved as UTF-8
 * and console operates in IBM850
 * Your local parameters may vary
 */

echo "Signal connecté\n" ; // This won't work

echo iconv('UTF-8', 'IBM850', "Signal connecté\n"); // This will
?>

OK, so we have a solution, but it is a PITA to use for debugging, much more so than just echo. Let's improve on it.

Buffering

What we would need is a way to run any console output through an iconv routine with the parameters we define above. It so happens that PHP has something exactly tailored for that purpose: the ob_* Output Buffering functions.

OB can be started from the php.ini file, or in code, which is probably safer, using the ob_start([callback [, chunk_size [, erase]]]) function.

Its first parameter is the name of a callback function, that will be invoked to perform the encoding, and PHP comes with a default callback using iconv: ob_iconv_handler(), which used predefined encodings as follows:

<?php
/**
 * Same remark as previously: change values according
 * to your local environment
 */
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "IBM850");
ob_start('ob_iconv_handler');

// And now you can just go:
echo "Signal connecté\n" ;
// ... and plow ahead ...
?>

Flushing

At this point output works, in the example below. However... since this is output buffering, all output is buffered until the end of the script, which is not so useful for debugging purposes: you'll usually want the echo to appear when some signal is passed to your callback, not when the whole program eventually ends, so we still need to improve on the solution.

There is a function devoted to solving this problem, and it is ob_flush(), which forces the converter callback to receive and process the data already output. When used in the CLI-based PHP-GTK environment, this will also cause a flush() to be performed, and the debug message to be output. Nice. So we can just use something like:

<?php
/**
 * Same remark as previously: change values according
 * to your local environment
 */
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "IBM850");
ob_start('ob_iconv_handler');

// And now you can just go:
echo "Signal connecté\n" ;
ob_flush();
/**
 * As soon as you use the two lines above (as in a callback)
 * a message will be output to the console, instead of
 * waiting until the program ends.
 */
?>

However, it is still a nuisance to have to call TWO functions just to output a bit of debug information. A simple workaround would be to create some iecho() function that would go like:

<?php
function iecho($s)
  {
  echo
$s;
 
ob_flush();
  }
?>

... but this definitely lacks elegance, and there is another, more serious problem: although the function will work the first time is called, on the second time, you will see your application looping on a PHP Warning: PHP Warning:  Cannot modify header information - headers already sent by (output started at <some file>:<some line>). What's happening ?

PHP-GTK is not PHP for the Web

The problem we just encountered stems from an undocumented feature of ob_iconv_handler() : since PHP was initially dedicated to web applications, this output buffering converted automagically generated a HTTP header defining the encoding applied to the output.

While this is definitely useful in a web context, it will fail in our PHP-GTK environment because the header is regenerated on each ob_iconv_handler invocation, meaning on any echo or print, although content has already been output thanks to the ob_flush() call, causing the familiar warning. And since the warning itself is trapped, it loops on itself.

Since it seems we cannot rely on ob_iconv_handler in this style of programming, we can just define our own handler:

<?php
/**
 * Same remark as previously: change values according
 * to your local environment. You'll probably want
 * to have these read at run time from some config file.
 */
function output_encoder($s)
  {
  return
iconv('UTF-8', 'IBM850', $s);
  }

function
iecho($s)
  {
  echo
$s;
 
ob_flush();
  }

// Activate the OB handler:
ob_start("output_encoder");

// Now you can just go:
iecho("Signal connecté\n");
// ... and plow ahead ...
?>

Now the header is no longer recreated each time, and you can indeed use iecho as much as you like. Now if only there existed a way to just call the builtins instead of a special function like iecho...

Auto-flushing

You may have noticed on the ob_start description a second parameter, the chunk_size. Setting it will cause the OB extension to automatically perform a flush when the contents of the output buffer reaches that size and a new line has been output. To quote the spec: the callback function is called on every first newline after chunk_size bytes of output.

As suggested by a note on the ob_start page, setting to 2 (0 or 1 won't work) will cause PHP to automatically flush its output buffer and invoke your function when meeting a new line after at least two characters, which means most (though probably not all) error messages.

This gives us our final solution:

<?php
/**
 * Same remark as previously: change values according
 * to your local environment. You'll probably want
 * to have these read at run time from some config file.
 */
function output_encoder($s)
  {
  return
iconv('UTF-8', 'IBM850', $s);
  }

// Activate the OB handler:
ob_start("output_encoder", 2);

// Now you can just go:
echo("Signal connecté\n");
// ... and plow ahead ...
?>

That's it ! Any debug message longer than 1 character and including a new line will be displayed in the proper encoding as soon as it is requested by an echo or print builtin, and we need neither explicit buffer flushing, nor a specific function to call for debug printing.