Last modified: 2013-06-27 20:34:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 39011 - redirect, pagelinks and page.page_is_redirect table information out of sync
redirect, pagelinks and page.page_is_redirect table information out of sync
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2012-08-03 14:45 UTC by Jörn Hees
Modified: 2013-06-27 20:34 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jörn Hees 2012-08-03 14:45:17 UTC
I imported the page, pagelinks and redirect tables from the 20120702 dump in order to analyze the pagelink structure. For this i need to resolve redirects and found some weirdness in the process:

There are some pages in the redirect table which don't have the page.page_is_redirect flag set:
  select * from redirect join page on rd_from=page_id where page_is_redirect != 1 limit 100;
Results in 23 rows: http://p.defau.lt/?smGJYbXXlqwLPhreqbe8VA


Another weirdness: there are pages marked as page.page_is_redirect but don't appear in the redirect table, even though that might be explained here: http://www.mediawiki.org/wiki/Manual:Redirect_table (see the NOTE as of Aug 2007)
  select * from page left join redirect on page_id=rd_from where page_is_redirect != 0 and rd_from is null limit 100;
Results in 5 rows: http://p.defau.lt/?pwJZMIVtTGGZYb69o8YWxg


Matthew_ from #wikimedia-tech ran these queries on the toolserver and seems the same problems exist in production.


Yet another weirdness can be found in the pagelinks table: there are ~25000 pagelinks for redirect pages which don't just link to the redirected page but to other pages alone in the article namespace:
  select count(1) from page inner join redirect on page_id = rd_from inner join pagelinks on rd_from=pl_from and rd_namespace=pl_namespace where pl_namespace = 0 and rd_title != pl_title;
+----------+
| count(1) |
+----------+
|    25168 |
+----------+
1 row in set (1 hour 44 min 35.16 sec)



Would be cool if someone could have a look.
Comment 1 jeremyb 2012-08-03 14:48:31 UTC
(In reply to comment #0)
> Matthew_ from #wikimedia-tech ran these queries on the toolserver and seems the
> same problems exist in production.

s/production/the TS replica/
Comment 2 Rob Lanphier 2012-10-16 20:56:44 UTC
It's going to be a bit longer before Sam can get to this.  Unassigning, though Sam is primarily responsible for "shell" bugs and will still have this implicitly on his list.
Comment 3 Giovanni Luca Ciampaglia 2013-03-27 18:21:31 UTC
Hi, 

looks like the second point has been fixed in the prod db:

mysql:giovanni@db1047.eqiad.wmnet [enwiki]> select count(*) from page left join redirect on page_id = rd_from where page_is_redirect != 0 and rd_from is null;


+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (39.33 sec)

mysql:giovanni@db1047.eqiad.wmnet [enwiki]>
Comment 4 Sam Reed (reedy) 2013-06-27 20:22:38 UTC
(In reply to comment #3)
> Hi, 
> 
> looks like the second point has been fixed in the prod db:
> 
> mysql:giovanni@db1047.eqiad.wmnet [enwiki]> select count(*) from page left
> join
> redirect on page_id = rd_from where page_is_redirect != 0 and rd_from is
> null;
> 
> 
> +----------+
> | count(*) |
> +----------+
> |        0 |
> +----------+
> 1 row in set (39.33 sec)
> 
> mysql:giovanni@db1047.eqiad.wmnet [enwiki]>


Agreed.

First one done too now. It only had 5 rows now. 1 was an erroneous redirect, presumably just never deleted. 1 was a redirect in user js files which I don't think ever worked. 2 were non existent pages.

mysql:wikiadmin@db60 [enwiki]> select * from redirect join page on rd_from=page_id where page_is_redirect != 1 limit 100;
Empty set (19.58 sec)
Comment 5 Sam Reed (reedy) 2013-06-27 20:34:11 UTC
What actually wants doing for the 3rd query?

I'm currently running it in a screen session against a quiet slave (selecting all fields, not just counting), so will have  a resultset to actually look at. I will see about attaching it to this bug when done, though I guess it's going to take upto a couple of hours to run

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links