Last modified: 2012-01-09 17:15:50 UTC
Currently (14-Nov-2011 dump), at pt.wiktionary, we have entries in the page_links table such as: pl_from pl_namespace pl_title 45839 6 Crystal_Clear_app_aim2.png 258396 0 Imagem:Flag_of_Esperanto.svg Although "Imagem" is an alias for namespace 6, the second record above appears as a namespace 0 page, with namespace 6 prefix. RoanKattouw and apergos realized that running namespaceDupes.php does not correct the situation because it doesn't touch the pagelinks table. Somehow, a maintenance operation is needed to update namespace dupes in tables other than the page table. Possibly, it could be included in the main namespaceDupes.php script.
It looks like pagelinks, imagelinks and externallinks (why does that have links to local images anyways?) all need cleaned up across the various wikis.
See also bug 32170
I'm not sure that bug 32170 is related; that mentions the bad links specifically being in the imagelinks table, while here they're listed as being in the pagelinks table. As for general cleanup; I believe the operating assumption on namespaceDupes etc is that you're expected to run rebuildLinks after doing this sort of title cleanup. But since you may be doing multiple such cleanup runs, it's not going to make any assumptions and do it for you.
Links get left over in imagelinks and pagelinks. I don't think we want to run rebuildlinks on de.wikipedia (for example). So maybe we need to change those operating assumptions. At worst the script could take an option.
Would touching the affected pages correct the problem in the wiki?
Answering my own question, and with some online help from RowanKattow, it does correct the problem.
With yesterday's ptwiktionary dump, a lot of them came back. A *new symptom* is the presence of pages in the User and User_talk namespaces being considered in namespace 0.
pl_from pl_namespace pl_title 88184 0 Usuário:Alkamid 55566 0 Usuário:Antoniolac 55566 0 Usuário:Cadum Usuário is an alias for the User namespace, but not the canonical form. Actually, the canonical forms never seem to been in problematic records, just other aliases.
You guys can tell for sure, but this list of erroneous links is very volatile and makes me wonder whether the solution resides in a manually run script. In every database dump, there are a few more different links with this problem. It's not just a problem of old links that need to be updated.
Please, check comment 23 on bug 31576. This may be a similar situation, of older copies of MW still rendering pages.
I'm going to dupe both this and bug 32170 to bug 33409, it's pretty obvious that's what's happening. This bug is identical to bug 32170, they're both caused by namespaceAliases being empty. In fact in bug 32170 I even mentioned that it causes local namespace aliases for File: to appear as pseudo-namespaces in pagelinks. We can't really say anything about the effectiveness of CdbReader_PHP change on December 14 at this stage, since even today we had job runners running on an old code base. But it's likely that when we finally manage to completely update our code, the frequency at which bad entries are added will be dramatically reduced. *** This bug has been marked as a duplicate of bug 33409 ***
New occurrences have been found today, comparing to the situation of 5th January in pt.wiktionary. This is *after* the decomissioned workers found before the 5th were killed manually.