Last modified: 2014-06-16 13:06:40 UTC
0) Summary I tried to build a mirror of `enwikinews' using `mwxml2sql'. This failed whenever `mwxml2sql' encountered a page from namespace 90 (Thread). I tried again using `maintenance/importDump.php'. This worked better. However, it appears that `importDump.php' ignores namespace 90, because no such pages are later found in the `enwikinews.page' database table. 1) Dataset `enwikinews-20140605-pages-meta-current.xml.bz2' 2) Error messages WHINE: (155323) no end page tag When I divide the XML data dump into smaller files of say 1000 pages, I can find many more such errors. 3) Pages that cause errors <page> <title>Thread:Comments:Chip and PIN 'not fit for purpose', says Cambridge r\ esearcher/Those in positions of power shirking responsibility and lying?</title\ > <ns>90</ns> <id>155323</id> <DiscussionThreading> <ThreadSubject>Those in positions of power shirking responsibility and \ lying?</ThreadSubject> <ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge\ researcher</ThreadPage> <ns>90</ns> <id>155323</id> <DiscussionThreading> <ThreadSubject>Those in positions of power shirking responsibility and \ lying?</ThreadSubject> <ThreadPage>Comments:Chip and PIN 'not fit for purpose', says Cambridge\ researcher</ThreadPage> <ThreadID>92</ThreadID> <ThreadAuthor>70.31.58.181</ThreadAuthor> <ThreadEditStatus>has-reply</ThreadEditStatus> <ThreadType>normal</ThreadType> <ThreadSignature>[[Special:Contributions/70.31.58.181|70.31.58.181]] ([\ [User talk:70.31.58.181|talk]])</ThreadSignature> </DiscussionThreading> <revision> <id>958267</id> <timestamp>2010-02-15T04:04:56Z</timestamp> <contributor> <ip>70.31.58.181</ip> </contributor> <comment>New thread: Those in positions of power shirking responsibility \ and lying?</comment> <text xml:space="preserve">"All the banks are lying. They are malici\ ously and wilfully deceiving the customer [...] The system is not fit for purpo\ se." I'm so surprised that I've apparently transcended a serious remark and instead \ am being sarcastic. Incidentally, only part of that sentence was sarcastic.</t\ ext> <sha1>rjidk12i4hv2mxia3a8qq620rlc7lok</sha1> <model>wikitext</model> <format>text/x-wiki</format> </revision> </page> 4) Namespace of pages that cause errors <namespace key="90" case="first-letter">Thread</namespace> 5) Use of `importDump.php' Apparently `importDump.php' ignores namespace 90. mysql> select page_id,page_namespace,page_title from enwikinews.page where page_id=155323; Empty set (0.00 sec) mysql> select page_id,page_namespace,page_title from enwikinews.page where page_namespace=90; Empty set (0.00 sec) Sincerely Yours, Kent