Last modified: 2005-11-15 11:19:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1938 - Unicode Byte Order Mark appearing in wikitext causes page to be cutoff
Unicode Byte Order Mark appearing in wikitext causes page to be cutoff
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
PC Windows 2000
: Normal critical with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: parser
Depends on:
Blocks: unicode
  Show dependency treegraph
Reported: 2005-04-21 10:56 UTC by Alphax
Modified: 2005-11-15 11:19 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Alphax 2005-04-21 10:56:24 UTC
As noted on
occcurences of the Unicode Byte Order Mark (0xFEFF) cause pages to be cutoff.
This has occurred on [[User_talk:Alphax]]
and [[User_talk:Duncharris]

What is the cause of these, how can they be found, and when will it be fixed?
Comment 1 Brion Vibber 2005-04-21 11:01:00 UTC
Seems to be a problem with tidy. Investigating...
Comment 2 Brion Vibber 2005-04-21 11:12:36 UTC
The problem seems to triggered by illegal entity references such as &#0xfeff;

This is not valid HTML/XML; the allowed numeric character references are decimal &#[0-9]+; and hexademical 
&#[Xx][0-9A-Fa-f]+;. Putting a 0 _before_ the x is nicely invalid. Tidy looks at this and assumes you had meant 
to type �xfeef;... and turns the � reference into a _literal_ null character in output.

A null character is actually ok in a PHP string, but the internal library interface to tidy seems to be treating tidy's 
output as a null-terminated string when copying it back to PHP and output ends at that point.

Sigh... Ideally, I can tell tidy not to do this kind of 'correction', which is one that makes more trouble than help.

Comment 3 Brion Vibber 2005-04-21 11:52:51 UTC
Our preexisting escaping would have fixed this in body text but was applied too early, so was not correcting the 
link text. I've moved the escaping down to after link replacement and it's working now.

Fixed in CVS HEAD and REL1_4 and live on site. Will be included in 1.4.3 release.

Use action=purge if necessary on affected pages with cached broken rendering.

Note You need to log in before you can comment on or make changes to this bug.