Last modified: 2011-08-23 20:44:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18554, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16554 - Import strips angle brackets on some installations (libxml2 entity bug)
Import strips angle brackets on some installations (libxml2 entity bug)
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Export/Import (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: upstream
: 18022 18355 18877 24238 30526 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-03 23:49 UTC by Chad H.
Modified: 2011-08-23 20:44 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Export of enwiki:Test (3.87 KB, text/xml)
2009-01-09 17:41 UTC, Chad H.
Details

Description Chad H. 2008-12-03 23:49:37 UTC
When Exporting, the greater and less than signs are turned into HTML entities. The importer doesn't seem to account for this. Importing (via interwiki and XML upload) both give "ref" "/ref" all over the place, missing their brackets.

Marking as CRITICAL as it's a blocker to export/import.
Comment 1 Brion Vibber 2008-12-08 22:58:53 UTC
Works just fine for me. They are turned into entities on export -- which is correct -- and reinterpreted back to their original values on import -- which is correct.
Comment 2 Chad H. 2009-01-09 02:32:09 UTC
Not so sure...ran an interwiki import last night on "Test" from enwiki to my localhost. Ended up with all < and > stripped, exposing an HTML comment. This is vanilla trunk, pretty standard config.
Comment 3 Tim Starling 2009-01-09 05:33:38 UTC
Works for me with both transwiki and file import. Downgrading severity. Need more information about the installations where this occurs.
Comment 4 Chad H. 2009-01-09 06:46:41 UTC
Doesn't seem to happen on my win32 box (no extensions, no tidy), but continues to happen on my CentOS machine (identical LocalSettings). Anything in particular you want me to check?
Comment 5 Tim Starling 2009-01-09 10:07:55 UTC
It's likely that the entity &lt; is not being sent (decoded) to the character data handler. Maybe it's being sent to some other handler (such as the default handler), maybe it's just discarded. The reason for this probably has something to do with the version or configuration of the libxml2 library. What would be nice is if you could help debug it. I think the first thing to try would be something like:

Index: includes/Import.php
===================================================================
--- includes/Import.php	(revision 45593)
+++ includes/Import.php	(working copy)
@@ -864,6 +864,7 @@
 			$this->appendfield = $name;
 			xml_set_element_handler( $parser, "in_nothing", "out_append" );
 			xml_set_character_data_handler( $parser, "char_append" );
+			xml_set_default_handler( $parser, "char_append" );
 			break;
 		case "contributor":
 			$this->push( "contributor" );

and then see what comes out on import. Please also report the following information about your system:

* From phpinfo(), whether there's a --with-libxml-dir or --with-libexpat-dir under "Configure Command" and what it's set to
* If expat is used, what version it is
* What the libxml2 version is
* The distribution and package version used to install PHP
Comment 6 Chad H. 2009-01-09 17:21:53 UTC
(In reply to comment #5)
> It's likely that the entity &lt; is not being sent (decoded) to the character
> data handler. Maybe it's being sent to some other handler (such as the default
> handler), maybe it's just discarded. The reason for this probably has something
> to do with the version or configuration of the libxml2 library. What would be
> nice is if you could help debug it. I think the first thing to try would be
> something like:
> 
> Index: includes/Import.php
> ===================================================================
> --- includes/Import.php (revision 45593)
> +++ includes/Import.php (working copy)
> @@ -864,6 +864,7 @@
>                         $this->appendfield = $name;
>                         xml_set_element_handler( $parser, "in_nothing",
> "out_append" );
>                         xml_set_character_data_handler( $parser, "char_append"
> );
> +                       xml_set_default_handler( $parser, "char_append" );
>                         break;
>                 case "contributor":
>                         $this->push( "contributor" );
> 
> and then see what comes out on import. 

Didn't fix it, no change in behavior.

> Please also report the following information about your system:
> 
> * From phpinfo(), whether there's a --with-libxml-dir or --with-libexpat-dir
> under "Configure Command" and what it's set to
> * If expat is used, what version it is
> * What the libxml2 version is
> * The distribution and package version used to install PHP
> 

--with-libxml-dir=/opt/xml2/ version 2.7.2.
Not using --with-expat. We're on PHP 5.2.6. This is (was? I know 5.2.8 is out) the default php-mysql build for CentOS 4.7, as far as I know. I haven't changed it.
Comment 7 Chad H. 2009-01-09 17:41:08 UTC
Created attachment 5655 [details]
Export of enwiki:Test

Here's the exact XML I've been attempting this on and getting the same error on upload and interwiki, 100% of the time. It's the Special:Export of "Test" from enwiki, r45489, importing to r45536
Comment 8 Tim Starling 2009-01-10 03:48:55 UTC
CentOS 4.7 base is still on PHP 4, and CentOS 4.7 plus has 5.1.6. I'm assuming PHP and libxml2 are both source installs. 
Comment 9 Tim Starling 2009-01-10 08:30:56 UTC
Confirmed in QEMU as reported.
Comment 10 Tim Starling 2009-01-11 09:39:10 UTC
Submitted upstream at http://bugs.php.net/bug.php?id=47066 . The workaround is to recompile with an ancient libxml2.
Comment 11 Chad H. 2009-01-11 15:30:04 UTC
Which was duped to http://bugs.php.net/bug.php?id=45996. Reported as fixed within the last 24 hours. Note however, that it requires the (not yet released libxml 2.7.3).
Comment 12 Tim Starling 2009-01-12 00:56:03 UTC
I suggest we leave this open until it's confirmed to be fixed on all commonly-used versions of libxml2. It'll help people search for a workaround.
Comment 13 Tim Starling 2009-01-12 01:20:56 UTC
rrichards (via IRC) advises us to migrate to xmlreader. The old xml extension suffers from inelegant and easily-broken expat-compatibility code.
Comment 14 Brion Vibber 2009-01-15 17:37:18 UTC
There's (fairly minimal) XMLReader-based code in backupPrefetch.inc which might be helpful as a base to work from; I do agree it's a much nicer interface to work with, and a redo of the import code would be a lot cleaner using it.

Note though that XMLReader is not bundled with PHP 5.0 (available only via PECL), and in 5.1 and later it's on by default in a *fresh* compile but many distro packages may not install it by default.

If we rely on XMLReader for core import functionality, we'll want to officially drop PHP 5.0 compatibility and do a check for the extension at install time (and at run time so we can fail gracefully).
Comment 15 Brion Vibber 2009-03-18 22:01:35 UTC
*** Bug 18022 has been marked as a duplicate of this bug. ***
Comment 16 Brion Vibber 2009-03-18 22:19:44 UTC
One option might be to do a runtime test, like we do for the PHP 5.0 64-bit array index bug; if the XML parser is buggy, we can throw a nice visible error explaining that you have to fix your installation instead of silently corrupting input.
Comment 17 Dan Jacobson 2009-03-18 22:26:04 UTC
Exactly.

This will prevent no end of pain months later when they try to untangle thier garbled edits.

Top priority if I were in charge.
Comment 18 Dan Jacobson 2009-03-19 15:46:32 UTC
I would issue an announcement:

"If you have used Special:Import, ...., ....,
since approximately .....
please check your imported pages for subtle corruptions, e.g.,
< Please see my [http://example.com/index.php?title=Resume&uselang=en resume]
> Please see my [http://example.com/index.php?title=Resumeuselang=en resume]
Unnoticed, there corruptions may become entangled in later edits,
making repair even more frustrating.
Users are advised to upgrade to MediaWiki 1.14.xx, 1.13.yy,..
The new versions of Special:Import,...
contain a test that will terminate with an error message:
The following faulty libraries out of MediaWiki's control and must be
updated first to avoid data corruption: ..."

I hope I'm not overdoing it, but subtle data corruption is one of the
most insidious bugs.
Comment 19 Chad H. 2009-03-19 16:08:07 UTC
This is an edge case, affecting a small subset of installs. Resetting priority and severity.
Comment 20 Dan Jacobson 2009-03-19 17:29:35 UTC
Told Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=520423
Comment 21 Alex Z. 2009-04-05 23:34:49 UTC
*** Bug 18355 has been marked as a duplicate of this bug. ***
Comment 22 Seth Rees 2009-04-06 21:31:13 UTC
Anyone care to give me a direct solution then, if there is one? I recently moved off of Wikia and I really need to import this Database Dump as soon as possible, and this bracket problem is holding me back.
Comment 23 Brion Vibber 2009-04-08 21:41:17 UTC
Upgrade PHP and libxml2 to the latest versions, or downgrade them to versions from before the problem. There are several links above which should provide more information.
Comment 24 Roan Kattouw 2009-05-22 22:08:21 UTC
*** Bug 18877 has been marked as a duplicate of this bug. ***
Comment 25 moissinac 2009-05-25 16:16:30 UTC
"Upgrade PHP and libxml2 to the latest versions, or downgrade them to versions
from before the problem. There are several links above which should provide
more information."

The latest version available is 2.7.2 at the time of this comment
It doesn't work with that version. So, the comment #23 is erroneous

And links before refers to libxml2-2.7.3 which is not clearly available.
I just spend near from an hour to try to find it. Nothing for now 
code from
Code from the W3C svn base libxml2 module, updated hourly libxml2-cvs-snapshot.tar.gz.
is 2.7.2

I will try to install a previous version on my xampp install
If anyone has tried it, give me an idea of the result. Thank's

Comment 26 Chad H. 2009-05-25 16:29:09 UTC
Did you not check their website? 

http://www.xmlsoft.org/news.html - latest version (as of January) is 2.7.3.
Comment 27 moissinac 2009-05-25 20:22:17 UTC
Sure
I don't know when and where was the mistake, but I really saw v2.7.2 as the latest
Now, I can find the 2.7.3
I will try it tomorrow
Thank's for the comment
Comment 28 Brion Vibber 2009-08-11 23:54:46 UTC
I added a test in r54828 which'll run at install or update time, but I don't have a broken system to test atm so haven't confirmed it...
Comment 29 Chad H. 2009-08-12 00:24:00 UTC
Checked against a known broken system. Works.

Marking this as WORKSFORME, as everything has been fixed upstream. Installs affected by this bug are inherently broken, both for MediaWiki and other PHP web apps (same as 64bit bug we check for). There's really nothing more we can do here.
Comment 30 Brion Vibber 2009-08-13 21:33:12 UTC
Also confirmed that it detects the bug on CentOS 5 liveCD with a libxml 2.7.2 RPM smashed on top.

I'm re-marking this FIXED. :)
Comment 31 Brion Vibber 2009-08-13 22:06:36 UTC
Also went ahead and merged this to REL1_15, so if we push out a 1.15.2 release it'll include the check, as will 1.16.x releases when they come.
Comment 32 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-10-09 16:30:25 UTC
Does this really warrant an uncircumventable fatal error on install?  If this is only a problem for import/export, maybe we could just raise a warning and disable those?  The user *might* have other apps they care about that are broken by the bug, but it's perfectly possible they don't, and there's no reason to flat-out prohibit installation in that case.  Rgoodermote on IRC was running into this bug and was fairly frustrated that installation of 1.16 just failed unpreventably.

(Also, fixed the version number in the error message in r57568.)
Comment 33 Chad H. 2009-10-09 17:09:43 UTC
We know it affects export/import badly. To be honest I haven't looked elsewhere in Mediawiki to see what else we might be corrupting--or if everything else is clear.

If it's only an import/export issue, then we can probably get away with just disabling those features rather than flat-out prohibiting install.
Comment 34 Max Semenik 2010-07-02 18:24:13 UTC
*** Bug 24238 has been marked as a duplicate of this bug. ***
Comment 35 Bill V 2010-07-02 18:45:23 UTC
This bug is affecting later versions as well. Once the install of all the components for Media Wiki v 1.15.4 on Solaris 9 is completed and I begin the configuration from the web page, you receive the following message:

"Your system has a combination of PHP and libxml which is buggy and can cause
hidden data corruption in MediaWiki and other web apps. Upgrade to PHP 5.2.9 or
later and libxml2 2.7.3 or later! ABORTING (http://bugs.php.net?id=45996 for
details). 

However, my installation is PHP 5.2.13 and libxml 2.7.7, both of which is later
than the above two versions.
Comment 36 Christian Neubauer 2010-10-13 15:55:38 UTC
Was this a problem in previous versions of MediaWiki?  I ask because we've been running 1.13.5 for months (years?) with the bad combination of PHP and libxml2 and never noticed any issues.  Granted we don't do much importing.  Now we are trying to upgrade to 1.16.0 and can't because of this error.  Since we can't do anything about the versions of PHP and libxml on our server, I'm tempted to comment out the check and move on.  Any advice?
Comment 37 Brion Vibber 2010-10-13 18:55:31 UTC
(In reply to comment #35)
> This bug is affecting later versions as well. Once the install of all the
> components for Media Wiki v 1.15.4 on Solaris 9 is completed and I begin the
> configuration from the web page, you receive the following message:
> 
> "Your system has a combination of PHP and libxml which is buggy and can cause
> hidden data corruption in MediaWiki and other web apps. Upgrade to PHP 5.2.9 or
> later and libxml2 2.7.3 or later! ABORTING (http://bugs.php.net?id=45996 for
> details). 
> 
> However, my installation is PHP 5.2.13 and libxml 2.7.7, both of which is later
> than the above two versions.

The installer tests for the bug itself, not for version numbers. It's possible that the documented versions we know work on most systems don't work in all circumstances, or that your PHP is actually linked with a different version of libxml2 than the one you're seeing on your system. (It's even possible that it's a slightly different, but related bug!)


(In reply to comment #36)
> Was this a problem in previous versions of MediaWiki?  I ask because we've been
> running 1.13.5 for months (years?) with the bad combination of PHP and libxml2
> and never noticed any issues.  Granted we don't do much importing.  Now we are
> trying to upgrade to 1.16.0 and can't because of this error.  Since we can't do
> anything about the versions of PHP and libxml on our server, I'm tempted to
> comment out the check and move on.  Any advice?

If you have the bug it would cause breakage on any version of MediaWiki, in at least the particular areas using XML parsing.

We added the big flashy warning on the installer because people would often not realize their setup was broken until *after* they ended up corrupting a bunch of data and getting very confused...


You might be able to get away with disabling the check as long as you don't use any of the following:
* Special:Import or its various command-line friends
* Blahtex, ExternalData, FCKEditor, MediaVid, SyntaxHighlight_GeSHi, WiktionaryInflection extensions

There may also be problems with SVG handling, as well as in other areas that didn't show up on a search for xml_parser_create().

You may also be more evilly affected with other apps running on your server; similar bugs were very disruptive to StatusNet's identi.ca site back when it was running on a flaky Solaris setup that we couldn't upgrade ourselves... we fixed that problem by changing hosts! :P

Be aware that disabling these checks is at your own risk -- you're acknowledging that you know that the software is telling you it will not work properly on your system.


I'm re-resolving this bug; if there's a better resource to help people diagnose and upgrade their broken PHP setups we can change the link, but that's about all we can do at this stage.
Comment 38 Brion Vibber 2011-08-23 20:44:17 UTC
*** Bug 30526 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links