Last modified: 2012-04-12 13:54:35 UTC
Created attachment 8474 [details] Parser function extension Bug.php causes article text to be truncated If a parser function returns any multibyte characters, even one, all article content gets truncated. The article has no content when rendered. This is simple to reproduce reliably with PHP 5.3.3 and MediaWiki 1.16.4. 1. Make sure your PHP is 5.3. We have 5.3.3 on our Linux host (CentOS 5.6). 2. Install the wiki extension Bug.php, attached to this report. 3. Create an article with three lines: This is the first line. {{#bug:}} This is the last line. 4. Save the article. The MediaWiki page is rendered but the article content is empty. View the HTML source and see there is no article text below the "start content" comment. I have reproduced this on 3 different wiki servers.
"\226" is not a multibyte character; it's an invalid UTF-8 byte sequence. That causes pcre to discard the input at some stage, and this has been the case for several years since PHP 5.1 or so. You must only return valid UTF-8 strings.
FYI, problem does not occur in PHP 5.1.6.
Also, no PHP error is logged.
There's no PHP error, it's internal to PCRE.
Does PCRE return any sort of status/result to MediaWiki that could be emitted as an error? It would be so helpful if *some* component in the whole system could identify the problem when it happens.
The function returns null, which could be detected. However *every single use* of preg_replace or preg_replace_callback, and large chunks of preg_match and preg_match_all calls might need to have checks added, and I'm not convinced that it solves much since the 'entire page goes blank when a bad multibyte char shows up' symptom has been very stable for several years.
Thanks Brian. We encountered this problem in an extension that reads from a SQL Server database via an ODBC driver that claims to return UTF-8. Unfortunately, whenever I construct a test case with that ODBC driver talking to PHP directly, I cannot get preg_match() to fail. Only when it talks to MediaWiki. Hence the bug report. Thanks for your reasoned commentary about this.