Last modified: 2012-04-26 03:02:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11900, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 9900 - Duplicate rows in externallinks table
Duplicate rows in externallinks table
Status: NEW
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.20.x
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://it.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2007-05-13 18:21 UTC by Broken Arrow
Modified: 2012-04-26 03:02 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Broken Arrow 2007-05-13 18:21:38 UTC
The externallinks table may contain duplicate rows, even if the link is present
only once in the page text. Editing the page does not remove the stale entries
on the live site; running refreshLinks.php on a local copy does.

The page above is only one of several examples. Some of the affected pages on
it.wp include Acanthocalycium, Fegato, Elezione_incondizionata, etc.
Comment 1 MZMcBride 2012-04-26 02:27:34 UTC
Is this still a problem?
Comment 2 Liangent 2012-04-26 02:39:25 UTC
(In reply to comment #1)
> Is this still a problem?

Seems there're still a bunch.

$ echo 'select el_from, el_to, count(*) c from externallinks group by el_from, el_to having c > 1;' | sql itwiki_p > bug9900

http://toolserver.org/~liangent/-/dbq/bug9900
Comment 3 Liangent 2012-04-26 02:40:22 UTC
545907 rows in set (2 min 45.47 sec)
Comment 4 MZMcBride 2012-04-26 02:54:03 UTC
Looking at tables.sql on Gerrit (<https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=blob;f=maintenance/tables.sql;h=a848bf5eb469ce63b2693b4a392241c5eab76dd1;hb=HEAD>), we can see the pagelinks, templatelinks, categorylinks, imagelinks, langlinks, and iwlinks all have a unique index on them. externallinks, however, has the following indices:

---
CREATE INDEX /*i*/el_from ON /*_*/externallinks (el_from, el_to(40));
CREATE INDEX /*i*/el_to ON /*_*/externallinks (el_to(60), el_from);
CREATE INDEX /*i*/el_index ON /*_*/externallinks (el_index(60));
---

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links