Last modified: 2013-06-18 15:17:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9497, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 7497 - mwdumper doesn't set page.is_redirect for borderline #redirect syntax
mwdumper doesn't set page.is_redirect for borderline #redirect syntax
Status: RESOLVED FIXED
Product: Utilities
Classification: Unclassified
mwdumper (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: orenbochman
:
: 31906 38919 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-05 09:01 UTC by Colin
Modified: 2013-06-18 15:17 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
use redirect-tag to set page_is_redirect field (2.44 KB, patch)
2010-04-04 17:28 UTC, Umherirrender
Details
Same patch, minor changes (3.12 KB, patch)
2012-10-19 10:33 UTC, JulesWinnfield-hu
Details

Description Colin 2006-10-05 09:01:29 UTC
in at least the 20060915 frwiki dump, the is_redirect column (in the page table) is set to 0 for a wide variety of articles where 
it should be 1 (first one of them is page_id=204, then 758, 917, ...).

These articles all have in common that they had a life before being declared as redirects, and when they were, the is_redirect 
field was apparently not updated to reflect the new state.
Comment 1 Brion Vibber 2006-10-05 18:26:26 UTC
1) The three given examples are all missing spaces:

#redirect[[Écrivains de langue française, par ordre chronologique]]
#redirect[[calcul parasitaire]]
#REDIRECT[[Période Chosŏn]]

2) You don't specify whether you're looking at the 'page' SQL table dump or the result 
of some kind of import from an XML dump.

All three pages have page_is_redirect set to 1 in the live page table, so should also 
be set to 1 in the SQL dump of the page table.

If you are looking at the results of an XML import, please specify:

a) exactly which file you're importing
b) exactly how you're importing it
c) exact version of MediaWiki
Comment 2 Colin 2006-10-06 10:00:31 UTC
Hi Brion

1) Yes, I've seen the space problem on many other examples, it might be the root cause.
2) Here are more details : I'm using mediawiki 1.8-svn. I have imported the 060915-pages-articles xml dump (after a mwdumper 
-> sql 1.5 translation with no strange options) and all .sql dumps *except* page.sql of course. Mwdumper was from svn too.

Update: you're right, mwdumper is the culprit.
I've just translated frwiki-20060929-pages-articles.xml (this time using the precompiled mwdumper.jar at http://
download.wikimedia.org/tools/) with the command line 
java -server -jar mwdumper.jar --progress=50000 --output=file:frwiki-20060929-pages-articles.sql --format=sql:1.5 
frwiki-20060929-pages-articles.xml

and I can read in the generated sql :
INSERT INTO page (...) (204,0,'Auteurs_par_ordre_chronologique','',0,0,0,RAND(),DATE_ADD('1970-01-01', INTERVAL 
UNIX_TIMESTAMP() SECOND),2334654,69) (...)
Comment 3 Siebrand Mazeland 2008-08-13 12:26:57 UTC
Assigning to brion. Problably created before new mwdumper issues were auto-assigned.
Comment 4 Umherirrender 2010-04-04 17:28:32 UTC
Created attachment 7262 [details]
use redirect-tag to set page_is_redirect field

Since r53271 the XML-Export has a extra tag.

The attached patch use that tag to set the field page_is_redirect of the page table.
Comment 5 db [inactive,noenotif] 2011-11-06 12:55:10 UTC
*** Bug 31906 has been marked as a duplicate of this bug. ***
Comment 6 JulesWinnfield-hu 2012-10-19 00:58:12 UTC
*** Bug 38919 has been marked as a duplicate of this bug. ***
Comment 7 bennylin 2012-10-19 10:17:15 UTC
From bug [[bug:38919]] (not sure whether the same bug or not, since this bug is from 2006)

The problem was on the code 

https://github.com/bcollier/mwdumper/blob/master/src/org/mediawiki/importer/Revision.java

It only look for English word "Redirect", while in non-English wikis it might be localized ("Alih" in Indonesian, for example)
Comment 8 JulesWinnfield-hu 2012-10-19 10:33:25 UTC
Created attachment 11207 [details]
Same patch, minor changes

Minor changes, in hopes of progress.
Comment 9 JulesWinnfield-hu 2012-10-19 10:40:32 UTC
(In reply to comment #7)
Same problem, this patch resolves that problem too, because sets the page_is_redirect differently, based on the redirect tag.
Comment 10 JulesWinnfield-hu 2012-10-27 02:23:18 UTC
gerrit change I27afb2a3

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links