Last modified: 2012-04-12 13:54:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30748, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28748 - One multibyte character in parser function output kills all article content
One multibyte character in parser function output kills all article content
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.16.x
All Linux
: Unprioritized major (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-29 18:02 UTC by Dan Barrett
Modified: 2012-04-12 13:54 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Parser function extension Bug.php causes article text to be truncated (408 bytes, text/x-wiki)
2011-04-29 18:02 UTC, Dan Barrett
Details

Description Dan Barrett 2011-04-29 18:02:40 UTC
Created attachment 8474 [details]
Parser function extension Bug.php causes article text to be truncated

If a parser function returns any multibyte characters, even one, all article content gets truncated. The article has no content when rendered. This is simple to reproduce reliably with PHP 5.3.3 and MediaWiki 1.16.4.

1. Make sure your PHP is 5.3.  We have 5.3.3 on our Linux host (CentOS 5.6).

2. Install the wiki extension Bug.php, attached to this report.

3. Create an article with three lines:

This is the first line.
{{#bug:}}
This is the last line.

4. Save the article. The MediaWiki page is rendered but the article content is empty. View the HTML source and see there is no article text below the "start content" comment.

I have reproduced this on 3 different wiki servers.
Comment 1 Brion Vibber 2011-04-29 18:05:03 UTC
"\226" is not a multibyte character; it's an invalid UTF-8 byte sequence. That causes pcre to discard the input at some stage, and this has been the case for several years since PHP 5.1 or so.

You must only return valid UTF-8 strings.
Comment 2 Dan Barrett 2011-04-29 18:06:54 UTC
FYI, problem does not occur in PHP 5.1.6.
Comment 3 Dan Barrett 2011-04-29 18:07:09 UTC
Also, no PHP error is logged.
Comment 4 Brion Vibber 2011-04-29 18:07:35 UTC
There's no PHP error, it's internal to PCRE.
Comment 5 Dan Barrett 2011-04-29 18:44:57 UTC
Does PCRE return any sort of status/result to MediaWiki that could be emitted as an error? It would be so helpful if *some* component in the whole system could identify the problem when it happens.
Comment 6 Brion Vibber 2011-04-29 18:47:03 UTC
The function returns null, which could be detected. However *every single use* of preg_replace or preg_replace_callback, and large chunks of preg_match and preg_match_all calls might need to have checks added, and I'm not convinced that it solves much since the 'entire page goes blank when a bad multibyte char shows up' symptom has been very stable for several years.
Comment 7 Dan Barrett 2011-04-29 18:53:28 UTC
Thanks Brian. We encountered this problem in an extension that reads from a SQL Server database via an ODBC driver that claims to return UTF-8.

Unfortunately, whenever I construct a test case with that ODBC driver talking to PHP directly, I cannot get preg_match() to fail. Only when it talks to MediaWiki. Hence the bug report.

Thanks for your reasoned commentary about this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links