Last modified: 2013-06-18 15:17:40 UTC
in at least the 20060915 frwiki dump, the is_redirect column (in the page table) is set to 0 for a wide variety of articles where it should be 1 (first one of them is page_id=204, then 758, 917, ...). These articles all have in common that they had a life before being declared as redirects, and when they were, the is_redirect field was apparently not updated to reflect the new state.
1) The three given examples are all missing spaces: #redirect[[Écrivains de langue française, par ordre chronologique]] #redirect[[calcul parasitaire]] #REDIRECT[[Période Chosŏn]] 2) You don't specify whether you're looking at the 'page' SQL table dump or the result of some kind of import from an XML dump. All three pages have page_is_redirect set to 1 in the live page table, so should also be set to 1 in the SQL dump of the page table. If you are looking at the results of an XML import, please specify: a) exactly which file you're importing b) exactly how you're importing it c) exact version of MediaWiki
Hi Brion 1) Yes, I've seen the space problem on many other examples, it might be the root cause. 2) Here are more details : I'm using mediawiki 1.8-svn. I have imported the 060915-pages-articles xml dump (after a mwdumper -> sql 1.5 translation with no strange options) and all .sql dumps *except* page.sql of course. Mwdumper was from svn too. Update: you're right, mwdumper is the culprit. I've just translated frwiki-20060929-pages-articles.xml (this time using the precompiled mwdumper.jar at http:// download.wikimedia.org/tools/) with the command line java -server -jar mwdumper.jar --progress=50000 --output=file:frwiki-20060929-pages-articles.sql --format=sql:1.5 frwiki-20060929-pages-articles.xml and I can read in the generated sql : INSERT INTO page (...) (204,0,'Auteurs_par_ordre_chronologique','',0,0,0,RAND(),DATE_ADD('1970-01-01', INTERVAL UNIX_TIMESTAMP() SECOND),2334654,69) (...)
Assigning to brion. Problably created before new mwdumper issues were auto-assigned.
Created attachment 7262 [details] use redirect-tag to set page_is_redirect field Since r53271 the XML-Export has a extra tag. The attached patch use that tag to set the field page_is_redirect of the page table.
*** Bug 31906 has been marked as a duplicate of this bug. ***
*** Bug 38919 has been marked as a duplicate of this bug. ***
From bug [[bug:38919]] (not sure whether the same bug or not, since this bug is from 2006) The problem was on the code https://github.com/bcollier/mwdumper/blob/master/src/org/mediawiki/importer/Revision.java It only look for English word "Redirect", while in non-English wikis it might be localized ("Alih" in Indonesian, for example)
Created attachment 11207 [details] Same patch, minor changes Minor changes, in hopes of progress.
(In reply to comment #7) Same problem, this patch resolves that problem too, because sets the page_is_redirect differently, based on the redirect tag.
gerrit change I27afb2a3