Last modified: 2012-01-27 10:30:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17152, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15152 - Ghost categories in wanted categories
Ghost categories in wanted categories
Status: RESOLVED DUPLICATE of bug 16112
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
All All
: Normal trivial (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: 16660
  Show dependency treegraph
Reported: 2008-08-13 13:51 UTC by Francisco
Modified: 2012-01-27 10:30 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Francisco 2008-08-13 13:51:46 UTC
There are several categories on which are listed
as wanted categories but do not contain any real pages. Only categories previously deleted seem to be affected (but obviously not all of them).
They are:

# Aree gaeltacht ‎(4 elementi)
# Film con trama ‎(4 elementi)
# Comuni Svizzeri ‎(3 elementi)
# Da finire Spagnolo ‎(2 elementi)
# Stub Biografie ‎(2 elementi)
# Storia dell'antico Egitto ‎(2 elementi)
# Rugbysti italiani ‎(1 elemento)
# Autori latini ‎(1 elemento)
# Artisti manga ‎(1 elemento)
# Rugbysti gallesi ‎(1 elemento)
# Arte di Siena ‎(1 elemento)
# Regine dei Belgi ‎(1 elemento)
# Laghi dell'Italia ‎(1 elemento)
# Internet e società ‎(1 elemento)
# Da finire Inglese ‎(1 elemento)
# Categorie orfane ‎(1 elemento)
# Trama ‎(1 elemento)
# Telefilm ‎(1 elemento)
# Stub Informatica ‎(1 elemento)
# Stub Cinema ‎(1 elemento)
# Strumenti ‎(1 elemento)
# Storia dell’Egitto ‎(1 elemento)
Comment 1 Antoine "hashar" Musso (WMF) 2008-08-27 17:01:15 UTC
This special page display cached content. It is updated from time to time
and might not reflect the actual state.
Comment 2 Jalo 2008-08-28 16:33:57 UTC

It is updated once a day, but theese categories are visible since June, or before.
Comment 3 Antoine "hashar" Musso (WMF) 2008-09-03 21:11:17 UTC
After talking with people in #wikipedia-it I am reopening this bug.

Some categories have been seen as seen as wanted for a long time although
they have no articles.

 SELECT cl_to, count(*) FROM categorylinks
 LEFT JOIN page ON cl_to=page_title AND page_namespace='14'
 WHERE page_title IS NULL AND cl_to='Stub_Biografie'
 GROUP BY cl_to;
 | cl_to          | count(*) |
 | Stub_Biografie |        2 |
 1 row in set (0.00 sec)

Looking at the categorylinks table :

> SELECT * FROM categorylinks WHERE cl_to='Stub_Biografie' \G
*************************** 1. row ***************************
     cl_from: 86979
       cl_to: Stub_Biografie
  cl_sortkey: Quinto Aurelio Simmaco
cl_timestamp: 20050430032231
*************************** 2. row ***************************
     cl_from: 85110
       cl_to: Stub_Biografie
  cl_sortkey: Rabindranath Tagore
cl_timestamp: 20050430032208
2 rows in set (0.01 sec)

Most probably, refreshlinks needs to clean out the categorylinks when cl_from
does not exist :)

Comment 4 Antoine "hashar" Musso (WMF) 2008-09-03 21:15:52 UTC
maintenance/refreshLinks.php :

// this bit's bad for replication: disabling temporarily
// --brion 2005-07-16

Comment 5 Brion Vibber 2008-09-03 21:49:23 UTC
The problem with that cleanup is that it can potentially take a *very* long time. Due to the way replication is seralialized in MySQL, long-running write queries disrupt the replication stream -- while it runs, slaves will be lagging significantly behind the master, which causes either user-visible disruption (old data served out) or too much load being diverted to the master, or some combination.

Additionally, a single giant query of that sort is likely to get rolled back as other processes make updates to the table while it's working.

To be replication-friendly, it may need to be broken down into smaller batches, updating up to a few hundred rows at a time.
Comment 6 Antoine "hashar" Musso (WMF) 2008-09-04 09:53:23 UTC
Thanks Brion for the explanation. I will code something safer.
Comment 7 Brion Vibber 2009-05-28 20:56:26 UTC
Note that the deletion of ghost entries in refreshLinks.php is now batched and replication-friendly. Possibly merge this issue with bug 12168?
Comment 8 Francisco 2009-09-06 17:14:20 UTC
There are other ghost categories:

# Biografia ‎(2 elementi)
# Componenti elettronici ‎(1 elemento)
# Sovrani greci ‎(1 elemento)
# Campionato di calcio italiano ‎(1 elemento)
# Template condizionali ‎(1 elemento)
# Album dei Doors ‎(1 elemento)
# Specie (uccelli) ‎(1 elemento)
Comment 9 MZMcBride 2009-12-12 16:10:53 UTC
Looks like Roan resolved bug 12168. This is still an issue on a lot of wikis. It may make sense to run the refresh script on all of them. If not, I'd like to add to this request (or maybe I should file a new bug?).
Comment 10 Roan Kattouw 2009-12-12 16:15:06 UTC
Looks like we want refreshLinks to be run on itwiki and enwiki. I will consult with the ops folks on Monday and run the script then if there are no objections from them. Even though refreshLinks is supposed to be safe now, I'd rather not run it against such large wikis on a Saturday.
Comment 11 MZMcBride 2009-12-12 16:39:42 UTC
It's an issue on other wikis as well. Using the Toolserver's copy of the databases:

mysql> SELECT c.* FROM categorylinks c
    -> LEFT JOIN page ON cl_from = page_id
    -> WHERE page_id IS NULL AND cl_from > 0;

-- dewiki_p
379 rows

-- frwiki_p
188 rows

-- enwiki_p
3842 rows

-- ruwiki_p
154 rows
Comment 12 Roan Kattouw 2009-12-12 20:48:05 UTC
(In reply to comment #11)
> It's an issue on other wikis as well. Using the Toolserver's copy of the
> databases:
> mysql> SELECT c.* FROM categorylinks c
>     -> LEFT JOIN page ON cl_from = page_id
>     -> WHERE page_id IS NULL AND cl_from > 0;
> -- dewiki_p
> 379 rows
> -- frwiki_p
> 188 rows
> -- enwiki_p
> 3842 rows
> -- ruwiki_p
> 154 rows

I've used the same query (omitting cl_from > 0) to track down and delete ghost entries on dewiki, frwiki, enwiki, ruwiki and itwiki. This is not a substitute for running refreshLinks of course, but that takes a long time on such large wikis.
Comment 13 Dan Collins 2011-07-09 01:59:47 UTC
Since the solution to this bug appears to be running a maintenance script, I'm changing this from MediaWiki->Special Pages to Wikimedia->Site Requests.
Comment 14 Diederik van Liere 2011-12-07 00:10:54 UTC
Is this still an issue?
Comment 15 Beta16 2012-01-18 12:03:58 UTC
Yes. For example in "Category:Biografie" is still present a ghost element. From the dump of 2012-01-09 in categorylinks table I found this entry:
 (2447501,'Biografie','Stoermer ,Mark August','2010-01-27 19:15:42','','','page')
but that page has been deleted.
Please run refreshLinks.php on all wikies periodically.
Comment 16 Nemo 2012-01-27 10:30:12 UTC

*** This bug has been marked as a duplicate of bug 16112 ***

Note You need to log in before you can comment on or make changes to this bug.