Last modified: 2012-05-30 17:12:56 UTC
The XML Schema for the XML dump format used by MediaWiki has no constraints for the page and revision identifiers. This can be easally fixed with the attached patch. Having it enforced in the XSD makes sense, since I think that some parsers capable of Schema validation can work more efficiently if they're there. Another reason is that (however unlikely) some other software might output files in this format are not obliged to keep the IDs unique, according the the XSD in its current form.
Created attachment 1154 [details] Adds unique identity constraints for page/id and page/revision/id
Wow, activity on my over 3.5 year old bug. I even changed my real name in the meantime ;) Is the bug still applicable?
Heh. I just checked. Yes patch still applies. Not that I care too much about this bug anymore, but could someone apply it?
Elvis, I'm sorry for the very, very late response. I'm asking developers to look at your patch soon.
I've looked at it and it looks good to me. Should this apply to only version 0.6 of the XSD or should it apply to all versions of XSD?
Heh better late then never. Diederik: I'm not sure and I'm on the train atm, but I guess it would make sense to enforce it in all versions. But 5 years is a long time, can't remember which version I made the patch against. Will check when I get home. Cheers.
This patch looks good to me also. Might as well apply it as far back as we can; if someone is producing old schema dumps that violate these constraints they have bigger problems on their hands than this enforcement change. Hmm with one exception I guess, if someone produces XML files with multiple entries for a given pageid (but each entry contains different revision ids), that could be a problem.
Elvis, can you respond to Ariel's suggestion? And did you have a chance to check what version(s) it should apply to?
Elvis: Thanks again for the patch. Are you interested in using developer access to directly suggest it into our Git source control system? https://www.mediawiki.org/wiki/Developer_access
I have submitted the change in Gerrit for review, see https://gerrit.wikimedia.org/r/8889 Patch credited to Elvis Stansvik. Once reviewed and merged in master, we will have to update the publicly facing URL at http://www.mediawiki.org/xml/export-0.7/ . This is covered by bug 37111.
Change merged. Will deploy export file.
Thanks! If I need a change in MW by Christmas 2018 I'll let you know. (Just kidding!) :)