Last modified: 2011-11-29 03:21:02 UTC
Created attachment 7778 [details] Patch to fix issues on Mediawiki DTD Hi, I'm trying to validate a Mediawiki XML file against its DTD file but the validation is failing. I'm trying using PHP DOMDocument, I haven't tried the validation with other tools so I can't be sure if the problem is on PHP or Mediawiki XML file, but I guess it is more likely to be on Mediawiki. I'm testing with the attached script (testMediawikiXml.php). When I try to validate the XML from http://en.wikipedia.org/wiki/Special:Export/Train I get the following error: Element '{http://www.w3.org/2001/XMLSchema}element': The attribute 'name' is required but missing This error can be fixed by commenting line 119 of http://www.mediawiki.org/xml/export-0.4.xsd. The content of this line is: <element minOccurs="0" maxOccurs="1" type="mw:DiscussionThreadingInfo" /> I guess the best solution is to add the "name" attribute but I haven't investigate and I don't know much about DTD to know what should be the value of the "name" attribute. If I try to run the script again another error occurs: Element '{http://www.mediawiki.org/xml/export-0.4/}namespace', attribute 'case': The attribute 'case' is not allowed. To fix this one I have added the following line below line 92: <attribute name="case" type="string" /> After those two changes to the DTD file I'm able to validate the XML file. I'm attaching the script I'm using to test and a patch with the changes I made to the DTD file. I guess that the second change is ok but the first issue need to be properly fixed (instead of just commenting the line). Thanks, Rodrigo.
Created attachment 7779 [details] Script used to test the validation
Tomasz, weren't you the one that last messed around with this?
These are both fine in trunk and 1.16wmf4. However, the XSD file in /usr/local/apache/common/docroot/mediawiki/xml needs updating and I'm not sure how to sync files from there to the cluster. Tweaking to be a shell request bug.
File has been sync'd to the servers and purged from squid. Should validate now.