Last modified: 2014-11-19 10:19:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 3311 - Automatic category redirects
Automatic category redirects
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
unspecified
All All
: Low enhancement with 55 votes (vote)
: Future release
Assigned To: Tyler Romeo
: performance
: 4879 5893 6750 8685 10236 15742 32262 (view as bug list)
Depends on: 167 710
Blocks: 17571 8685
  Show dependency treegraph
 
Reported: 2005-08-30 18:05 UTC by p_simoons
Modified: 2014-11-19 10:19 UTC (History)
39 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch (841 bytes, patch)
2006-12-19 18:33 UTC, Rotem Liss
Details
Patch (1.91 KB, patch)
2007-01-06 18:13 UTC, Rotem Liss
Details
Patch (6.38 KB, patch)
2007-01-06 20:06 UTC, Rotem Liss
Details
Include redirected members in category view (1.10 KB, patch)
2007-06-25 15:33 UTC, Roan Kattouw
Details
Include redirected members in API list=categorymembers (1.42 KB, patch)
2007-06-25 15:45 UTC, Roan Kattouw
Details

Description p_simoons 2005-08-30 18:05:02 UTC
Supposing category:A redirects to category:B.
Would it be feasible to automatically move all articles placed in cat:A into
cat:B instead?
Alternateively, would it be possible to create a Specialpage that lists all
categories that are redirects, so that a bot can do the moving?
Comment 1 Rowan Collins [IMSoP] 2005-08-31 16:08:15 UTC
I've been pondering for a while how to deal with this. The cleanest solution
(automatically reassigning the pages) is impossible unless/until we move
category information out of the page content and into a separately editable
"metadata" display.

The best alternative I can think of is to follow redirects from category to
category when displaying the category page. For instance, if Category:Foo is
moved to Category:Bar, and you then view Category:Bar, pages containing
"[[Category:Foo]]" should show up in the "articles in this category" list,
perhaps with a footnote-marker explained below as "Via alternate name
''Category:Foo''" so they can be distinguished for maintenance etc. On the
technical front, this would mainly consist of recursively following redirects
backwards, like Special:Whatlinkshere does, only using the categorylinks table.

(In reply to comment #0)
> Alternateively, would it be possible to create a Specialpage that lists all
> categories that are redirects, so that a bot can do the moving?

That would certainly be pretty easy, I'd have thought.
Comment 2 p_simoons 2005-09-01 08:03:52 UTC
Out of curiosity - is moving category info into metadata a planned change?

Would this work... if you save a page, some lookups are already made for e.g.
substing in templates. Would it be feasible to check, when a page is saved,
whether it is in any categories, and whether those categories are redirects, and
if so, to change them accordingly?
Comment 3 Rowan Collins [IMSoP] 2005-09-01 11:31:40 UTC
(In reply to comment #2)
> Out of curiosity - is moving category info into metadata a planned change?

Well, category membership is already stored in a special 'categorylinks' table
in the database, so it's not unreasonable to consider presenting the current
content of this to the user, rather than also storing it as part of the article
text. But there's no specific plans to do this any time soon, afaik, it's just a
thought that gets floated occasionally.

> Would this work... if you save a page, some lookups are already made for e.g.
> substing in templates. Would it be feasible to check, when a page is saved,
> whether it is in any categories, and whether those categories are redirects, and
> if so, to change them accordingly?

I guess that would be kind of possible, but I'm not keen on the idea of actual
text in an article changing without the user's consent, as it were. Hence the
need to present this metadata as separate from the main content - if there were
a box labelled "Current category memberships", it could be changed either by
hand, or by the software, as appropriate. OTOH, having the categories in the
article text has the advantage that changing them shows up in the article
history, watchlists, etc...
Comment 4 p_simoons 2005-09-01 12:01:01 UTC
> I guess that would be kind of possible, but I'm not keen on the idea of actual
> text in an article changing without the user's consent, as it were.

It's not dissimilar from doing {{subst:sometemplate}} when sometemplate in fact
redirects to another template. This is mainly intended for such things as
"American actors" and "United States actores" - both categories are the same,
but many people don't know that so they put articles in either or both (which
has the undesirable side-effect that neither category is in fact complete).
Having a true cat redirect would prevent that.

(by the way wouldn't having categories as metadata allow for cross-secting
categories? e.g. list all articles that are in both cat:A and cat:B? That would
be terrific)
Comment 5 Rowan Collins [IMSoP] 2005-09-01 12:27:27 UTC
(In reply to comment #4)
> > I guess that would be kind of possible, but I'm not keen on the idea of actual
> > text in an article changing without the user's consent, as it were.
> 
> It's not dissimilar from doing {{subst:sometemplate}} when sometemplate in fact
> redirects to another template. 

It's not really the same at all - in that case, the user has specifically
requested that their text be converted on save into something else; the only
relevance of the redirect is that the content that goes in isn't from exactly
the same title they typed. Changing a category in a page on edit, however, might
happen without the user even *touching* the list of categories - the category
might have become a redirect since the page was last editted - and yet the user
will be creditted with having made that change to the text. [And what if someone
vandalises a category to be a redirect somewhere inappropriate; suddenly,
innocent editors appear to be vandalising the pages in that category...] 

Generally, this is a kind of voodoo that should be minimised, because users who
don't know why it's happening will become confused and frustrated - "I typed X,
but when I saved the page it said Y instead; what's going on?"

Don't get me wrong, I absolutely agree that some solution to this would be very
useful, as your example demonstrates; I'm just saying that automatically
changing the content of an article because something's changed elsewhere is a
departure from the current data model. If we're to go down that route, it needs
to be in a more bot-like form, which makes an explicit edit to all affected
articles with an appropriate summary. Unless, as I say, the list of categories
is removed from the article's content and placed in a separate box pulled
straight from the database, which can be dynamically updated by both users and
system operations.

> (by the way wouldn't having categories as metadata allow for cross-secting
> categories? e.g. list all articles that are in both cat:A and cat:B? That would
> be terrific)

Perhaps I haven't put this very clearly: categories are already *stored* as
metadata (there's a seperate table in the database that basically stores {page,
category} pairs); but currently you can't edit that metadata directly, only by
changing the content of the article. Indeed, the <DynamicPageList> extension
used on Wikinews (see [[meta:DynamicPageList]]) can already do the kind of lists
you're talking about; it's just that it's potentially very db-intensive to allow
unbounded lists like this, I think. 

What would become possible is an interface for editting categories from the
other side, as it were - edit the list of "pages in this category", and those
pages would change automagically.
Comment 6 Jakob Voss 2006-01-25 20:05:24 UTC
I'd also like redirects for categories but I doubt a smart solution will be
possible while categories are part of the article text. Until then a simple
special page would help: 

[[Special:CategoriesThatAreRedirects]] - lists all categories that are redirects: 

SELECT page_title from page where page_is_redirect=1 AND page_namespace=14

So the user can find categories that are redirects, get it's articles, change
category tags and delete the category afterwards. But what if the next user
creates the deleted category again? So you better not delete the category that
is a redirect but use another new special page:

[[Special:CategoriesThatAreRedirectsButAreNotEmpty]] - lists all categories that
are redirects but have pages in it:

SELECT page_title from page where page_is_redirect=1 AND page_namespace=14 AND
EXISTS (SELECT * FROM categorylinks WHERE cl_to=page_title);

The second special page is also fast for most Wikipedias I testet (it took 22
seconds at the Toolserver for the first call at enwiki_p but following calls are
faster because of some kind of caching I don't know). At the moment there are
580 category redirects and 171 of them have pages in it - I'd call them "lost
categories" because if a user clicks on them at an article he won't find the
article in the category he is directed to. There are 1348 articles pages in the
English Wikipedia that are partly hidden this way:

SELECT cl_from FROM categorylinks WHERE EXISTS (select page_title from page
WHERE page_title=cl_to AND page_is_redirect=1 AND page_namespace=14 AND EXISTS
(SELECT * FROM categorylinks AS C WHERE C.cl_to=page_title));




Comment 7 lɛʁi לערי ריינהארט 2006-02-03 12:52:09 UTC
Marking as dependend of
Bug 710: Redirect to category page doesn't work

Comment 1 is about "category:A redirects to category:B"
The genaral case (bug 710) is "namespace:pagename redirects to category:B"

best regards reinhardt [[user:gangleri]]
Comment 8 Brion Vibber 2006-02-05 18:48:24 UTC
*** Bug 4879 has been marked as a duplicate of this bug. ***
Comment 9 brianna.laugher 2006-02-24 00:29:10 UTC
Having this problem solved for the Commons would be fantastic. Because of this
problem we currently have a restriction that all category names should be in
English. As you can imagine that hardly does anything to promote the
multilingual policy of the Commons and I'm sure it's one of the things that
turns off would-be contributors whose native tongue is not English.

If this was implemented (as I understand it), we'd be able to have
[[:Category:Maus]], [[:Category:Mouse]] and [[:Category:Mysz]], and the effect
of putting an image in any of them would be the same in the end.
Comment 10 David Benbennick 2006-02-24 09:12:33 UTC
It seems to me that we don't want to "automatically move all articles".  When
[[A]] redirects to [[B]] and you link to A in an article, MediaWiki doesn't
replace that with [[B|A]].  I think all that's necessary is that if Category:A
redirects to Category:B, then every article in Category:A appears in the
category listing for Category:B.  Implementing this wouldn't require having
articles store category links as metadata, since it has nothing to do with
''how'' a particular article got into a category.
Comment 11 Jakob Voss 2006-04-09 15:44:43 UTC
It would also help get a message in category view at the list of pages in this
category that shows a list of categories that redirect to the viewed category.
Example:

1. There is Category:Mouse
2. People regularly use Category:Maus instead of Category:Mouse
3. So you create Category:Maus as a redirect to Category:Mouse
4. If you view Category:Mouse you miss all the articles in Category:Maus
5. So a message is shown: "Category:Mouse is also know as Category:Maus. Please
move the articles to the redirected category."

The current status is unsatisfying. In commons 857 categories are redirects and
1263 pages use this categories (452 pages in 677 categories for the English
Wikipedia):

SELECT COUNT(*) from page where page_is_redirect=1 AND page_namespace=14;

SELECT COUNT(DISTINCT cl_from) from categorylinks, page WHERE 
page_is_redirect=1 AND page_namespace=14 AND cl_to=page_title;
Comment 12 Rotem Liss 2006-04-09 15:50:19 UTC
(In reply to comment #11)
> It would also help get a message in category view at the list of pages in this
> category that shows a list of categories that redirect to the viewed category.
> Example:
> 
> 1. There is Category:Mouse
> 2. People regularly use Category:Maus instead of Category:Mouse
> 3. So you create Category:Maus as a redirect to Category:Mouse
> 4. If you view Category:Mouse you miss all the articles in Category:Maus
> 5. So a message is shown: "Category:Mouse is also know as Category:Maus. Please
> move the articles to the redirected category."
> 
> The current status is unsatisfying. In commons 857 categories are redirects and
> 1263 pages use this categories (452 pages in 677 categories for the English
> Wikipedia):
> 
> SELECT COUNT(*) from page where page_is_redirect=1 AND page_namespace=14;
> 
> SELECT COUNT(DISTINCT cl_from) from categorylinks, page WHERE 
> page_is_redirect=1 AND page_namespace=14 AND cl_to=page_title;
> 

There is no need to do that - if we fix that bug, the pages in Category:Maus are
automatically shown in the Category:Mouse. Your suggestion is just a workaround.
Comment 13 Eugene Zelenko 2006-05-11 13:35:30 UTC
*** Bug 5893 has been marked as a duplicate of this bug. ***
Comment 14 Moses 2006-05-11 16:17:45 UTC
Category redirect is especially useful in chinese wikipeida and wikinews. 

As you may know, In chinese, one thing can be writen both in traditional chinese
and simplified chinese. These character to represent the same thing and both of
them are correct and should be existe. For example, [[Category:災難]] and
[[Category:灾难]] are the same (BTW: "災難" and "灾难" mean "disaster").

However, redirect [[Category:災難]] to [[Category:灾难]] is useless, now.
Articles categorized under 災難 still can not be seen in [[Category:灾难]].
Category redirect only redirected the category *itself*, but not the articles
categorized in the category.
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-07-20 00:55:47 UTC
*** Bug 6750 has been marked as a duplicate of this bug. ***
Comment 16 Rotem Liss 2006-12-19 18:33:44 UTC
Created attachment 2905 [details]
Patch

This patch changes Parser::replaceInternalLinks: when parsing a link to a
category (e.g. [[Category:A]], *not* [[:Category:A]]), checking if it's exist
in the "redirect" table (using a slave - I think it's OK), and if so and the
redirect is to another category, overriding the current title (variable $nt)
with the redirected title.

This fixes both the display in the page and the DB (table "categorylinks"), as
the additions to this table are done using the same parser method.

The patch works for me, however it should still be checked for regressions.
Comment 17 Rotem Liss 2007-01-06 18:13:17 UTC
Created attachment 3015 [details]
Patch

Two additions:
1. Allowing categories to be moved.
2. Updating categorylinks when moving categories.

TODO: Fix new redirect (currently links to [[Category:B]] instead of
[[:Category:B]]), update categorylinks when editing a category.
Comment 18 Rotem Liss 2007-01-06 20:06:40 UTC
Created attachment 3016 [details]
Patch

Fixing redirect page, move page message and links to the page when it becomes a
redirect.

Summary of the patch changes:
1. Redirect the categories: if a category is a redirection, make the links to
it and the items in "categorylinks" table refer to the redirected category when
parsing the page.
2. Update categorylinks when a category is edited and becomes a redirect.
3. Make category moves possible - remove it from the forbidden namespaces.
4. Update categorylinks when a category is moved.
5. Fix the redirect page left when a category is moved: used a colon to prevent
inclusion in the category.
6. Fix the links in pagemovedtext.

Things which still have to be done:
1. Update categorylinks when a redirect category becomes a redirection to
another category.
2. Update categorylinks when a redirect category becomes a regular category
which is not a redirection.
I think that these require another field in categorylinks, "cl_original_to"
(may be null, or maybe same to "cl_to" if not redirected?), which specifies the
original target (which is now a redirection). If it's not added, it's not
possible to update categorylinks because it's not *known* which pages are in
this category. I don't know if this field should be added, these things should
not be fixed, or there is another way to do it. Any ideas?
Comment 19 Rick Block 2007-02-25 18:39:00 UTC
As an expediency, somewhat short of full category redirect support, can a change be made so that when an article is 
saved it is added to the target of a redirected category rather than the redirected category (i.e. if category:A is redirected 
to category:B, when changing an article to add it to cateogry:A the article is actually added to category:B)?  Doing this 
one change, in combination with recat bots like RobotG, would enable category redirects to work nearly perfectly.
Comment 20 Frank Bartus 2007-03-08 22:27:15 UTC
Endorse expediency request above with fervency! Moving/remaning would be nice
too, but per
[http://en.wikipedia.org/wiki/Wikipedia_talk:Categories_for_discussion#Propose_tagging_with_both_and_expanding_use_of_Cat_redirects_overall
this] the method of combined soft and hard redirects put together with the
proper linking at the time of saving a page would pretty well cover normal
editing objections to redirecting categories. 

CONSIDER: There are multiple ways to phrase the equivalent page classification
in English, but note the three legged stool... most every language's wiki, no
matter what type project, one way or another connects with interwiki's to the
English Wikipedia articles pages (if only through their own article on the topic
in their Wikipedia), the commons category, and/or the Wikipedia category (Which
I labor mightily to synchronize, as much as possible, so I know them well). 

How a language alliterates into English translation we native speakers would
find to be an awkward phrasing more often than not, AND VICE VERSA, so on the
commons, redirects of categories are tolerated specifically to cover such
'ambiguities', including redirects of foreign category names to the "Official"
English name (Unlike most namings on the commons, Principal Category names are
English by fiat... articles, images, etc. can all be other languages. So the
importance should be obvious, I hope!). 

So too do we native speakers have our choice of how to state a category name...
(e.g. Countries in Europe, vs. Countries of Europe) the halls of the
en.Wikipedia (and I suspect all others!) CFD discussions are ankle deep in blood
from some of those debates! That in part exists because of different schemes in
related categorisation (Geography, Maps, and History all intersect Countries...
so comes complications, or a need to alias however belatedly! <g>) Which frankly
is a needless waste of time, were it easy to alias a name, and that name be the
one 'online' per this proposal. 

In one sense these names issues are are trivialities, but they are important
trivialities, as recollection and modes of phrase formulation are inherently
personal things involving the way each of us thinks. So there is a natural
factionalism as others think like me, and some like him, and even when
compromises occasion, they involve a lot of work for someone... which is
hopefully the guys not thinking like me! <BSEG> Bottom line, aliasing would
prevent and eliminate a lot of relatively uselessly wasted man-hours renaming
things. The computers should be doing that drudgery, not we humans, save for you
developers... if you do it once, your effort pays back over and over for the many.

In sum, this matter has a inherently more important and higher priority than
convenience of one editor, but to multiples of the many of many's of editor's
across all nationalities! I guess I'm saying this has had far too low a priority
heretofore. It seems simple, so unimportant, but categories are fundamental to
organising the projects, hence the effects are vastly magnified. In one swell
foop, all the nit-picky (local language) name choices can be trivialized and one
uniform name emerge in each locale&mdash;yet still not only retain, but actually
enhance interconnectivity between sisters and within a given project. So please
do expedite both a determination on Rick's iterim proposal, and a full
implementation allowing name moves and the like. 

Just cutting down the debate will free many daily man-hours each day on CFD, so
delay is frankly, costly and world wide costly at that. Best regards // ~~~~
Comment 21 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-04-22 14:12:59 UTC
> 2. Update categorylinks when a category is edited and becomes a redirect.
> ...
> 4. Update categorylinks when a category is moved.

This might be an issue.  I don't know if an UPDATE of, for the current worst case,
a couple hundred thousand rows is acceptable.  An alternative would be to check the
redirect status at display time rather than on update, as we do for pages: retrieve
all pages that are in category X or anything that redirects to it.  Of course that
means faster UPDATE and slower SELECT, which is generally the reverse of what's
wanted.  We should ask Domas or Tim or someone what's best, I guess.

> Things which still have to be done:
> 1. Update categorylinks when a redirect category becomes a redirection to
> another category.
> 2. Update categorylinks when a redirect category becomes a regular category
> which is not a redirection.
> I think that these require another field in categorylinks, "cl_original_to"
> (may be null, or maybe same to "cl_to" if not redirected?), which specifies the
> original target (which is now a redirection). If it's not added, it's not
> possible to update categorylinks because it's not *known* which pages are in
> this category. I don't know if this field should be added, these things should
> not be fixed, or there is another way to do it. Any ideas?

Short of reparsing every page in the category, this probably does require an
extra field, yes.  Other schema updates might be necessary to make the updates
for large categories efficient.  I think that whatever happens will be more
efficient than bots loading tons of pages and forcing them to be reparsed, though.
:)
Comment 22 Rob Church 2007-06-13 00:09:32 UTC
*** Bug 10236 has been marked as a duplicate of this bug. ***
Comment 23 Roan Kattouw 2007-06-25 15:33:00 UTC
Created attachment 3823 [details]
Include redirected members in category view

The patch submitted earlier kind of scares me. Consider the following scenario:

1. Page is categorized in Category:A
2. Category:A becomes a redirect to Category:B
3. Page is updated accordingly
4. Category:A becomes a redirect to Category:C
5. Page is NOT updated accordingly, since it is treated as a member of Category:B.

Someone suggested a new DB field to counter this, but that isn't necessary.

The attached patch fixes this bug in a simpler way, without the problem described above. When you view Category:B, the code will check if any other categories redirect to B. If Category:A redirects to Category:B, both A and B's members will show up when you view Category:B. As usual, double redirects won't work, i.e. if A redirects to B and B redirects to C, Category:C will show B and C's members, but not A's.

The attached patch makes moving categories easy, just remove NS_CATEGORY from the forbidden namespaces. Since everything is handled transparently through redirects (just like we do with normal pages), no problems should ensue.
Comment 24 Zhen Lin 2007-06-25 15:40:46 UTC
A solution was proposed using the redirects table in bug 8685 ...
Comment 25 Roan Kattouw 2007-06-25 15:45:32 UTC
Created attachment 3824 [details]
Include redirected members in API list=categorymembers

This patch does the same thing as my previous patch, with the difference that this one fixes listing category members in the MediaWiki API as opposed to the category page itself.
Comment 26 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-25 17:10:57 UTC
*** Bug 8685 has been marked as a duplicate of this bug. ***
Comment 27 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-25 17:16:38 UTC
I'm hardly an SQL expert, I'm afraid, but any particular reason you added an extra query rather than joining?  I doubt it makes much difference, though, performance-wise.

I'm a bit alarmed that the change has to be made separately for the API, rather than both calling a general-purpose public method of something, but I guess that's a separate issue.

I'll take a look at this and hopefully commit something today or tomorrow.  Although I notice a few bugs assigned to me that I've totally forgotten about, so let's hope this doesn't become one.  :D
Comment 28 Roan Kattouw 2007-06-25 17:48:35 UTC
(In reply to comment #24)
> A solution was proposed using the redirects table in bug 8685 ...

(In reply to comment #27)
> I'm hardly an SQL expert, I'm afraid, but any particular reason you added an
> extra query rather than joining?  I doubt it makes much difference, though,
> performance-wise.
The JOIN suggestion suggested in bug 8685 didn't work (selected the wrong data from the page table), and since I'm not particularly good at writing complex SQL queries either, I decided to do it this way. I think my way may actually be faster, since the latter query is still a regular category lookup which is indexed. A complex JOIN statement wouldn't have the indexing benefit (correct me if I'm wrong).


> I'm a bit alarmed that the change has to be made separately for the API, rather
> than both calling a general-purpose public method of something, but I guess
> that's a separate issue.
This is partly due to the fact the API provides much more information and filtering options than you'll ever need in a regular page. Also, the current code mixes DB code with UI code, which makes a lot of functions unusable for the API. Article.php and EditPage.php are good examples.

> I'll take a look at this and hopefully commit something today or tomorrow. 
> Although I notice a few bugs assigned to me that I've totally forgotten about,
> so let's hope this doesn't become one.  :D
We're all humans, we all need breaks ;) take your time.

Comment 29 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-26 00:35:29 UTC
Checking EXPLAIN shows that the query will use a filesort due to replacement of simple equality with a check for IN.  I got the same trying the one-query join technique, adjusted to give correct results.  Domas will probably kill me if I add a gratuitous filesort to every category, so I (we) will have to ask him why it's filesorting and how to stop it.

(By the way, more easily fixed but also significant, your check for rd_title='title' alone can't use the redirect table's indexes, because the index is on (rd_namespace, rd_title).  You need to add 'rd_namespace' => NS_CATEGORY to the conditions for that query to be efficient.)
Comment 30 Roan Kattouw 2007-06-26 13:14:02 UTC
(In reply to comment #29)
> (By the way, more easily fixed but also significant, your check for
> rd_title='title' alone can't use the redirect table's indexes, because the
> index is on (rd_namespace, rd_title).  You need to add 'rd_namespace' =>
> NS_CATEGORY to the conditions for that query to be efficient.)
By all means do so. I know just enough about MySQL to get by, and have no idea how all those optimizations work. 

Comment 31 Roan Kattouw 2007-06-30 07:54:55 UTC
(In reply to comment #29)
> Checking EXPLAIN shows that the query will use a filesort due to replacement of
> simple equality with a check for IN.  I got the same trying the one-query join
> technique, adjusted to give correct results.  Domas will probably kill me if I
> add a gratuitous filesort to every category, so I (we) will have to ask him why
> it's filesorting and how to stop it.
I don't really understand any of that (for instance, my query doesn't use IN AFAIK), but I understand it's a performance problem. How can that be solved? 

Comment 32 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-07-01 02:30:36 UTC
Your code contains an IN because you have 'cl_to' => $titles as a condition, with $titles an array, and that translates to (cl_to IN ($titles)).  I don't know how it can be solved, try asking Domas or someone.
Comment 33 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-07-03 03:08:34 UTC
After discussion with Domas, it seems that any attempt to check for redirects in the current schema will *probably* cause a filesort, or at least all the ones suggested did.  We probably need a new field, cl_real_to or something, that has the redirect pre-resolved.  When adding a category to a page, the actual target would be put in cl_to as now; then if it's a redirect, the redirect target would be put in cl_real_to, otherwise that would be a copy of cl_to (or it would be NULL, depending on which works better).  Then cl_real_to would be used for displaying category pages in place of cl_to.  Whenever a category is changed to a redirect, or  the target of a category redirect is changed, categorylinks would be updated appropriately.

River pointed out that if cl_real_to is an id instead of a title, it will persist across renames of the category.  But Rob pointed out that that only works if the category has an associated page.  River then suggested a cat_id, which may or may not be going too far for this exercise.  We can always stick updates for cl_real_to in the job queue, basically mimicking the current bot-update situation.
Comment 34 Roan Kattouw 2007-07-03 09:42:58 UTC
(In reply to comment #33)
I think cl_real_to is the way to go. Queries would be indexed, you'd have something like WHERE cl_to='title' OR cl_real_to='title';. Updating cllinks would be the simple (but potentially massive) query UPDATE categorylinks SET cl_real_to='redirtarget' WHERE cl_to='redirname';.
Comment 35 Eugene Zelenko 2008-09-29 17:46:35 UTC
*** Bug 15742 has been marked as a duplicate of this bug. ***
Comment 36 Le Chat 2008-11-05 13:15:16 UTC
Sorry, the technical stuff is over my head, but can someone explain what the chances are of this being fixed? What effect do the patches referred to have? Can we expect members of redirected categories to show up on the target category page?
Comment 37 Sebastian Helm 2008-12-13 21:42:28 UTC
Just came here from http://en.wikipedia.org/wiki/Wikipedia_talk:Categories_for_discussion#Category_redirects and wanted to add a vote for this bug.

Disclaimer: It's a long time that I last was here, and I didn't find a "vote" feature, as the Mozilla db has. Also, I didn't spend much time to understand the discussion of this and related bugs, and I don't understand the difference to bug 710, which is supposedly fixed. 
Comment 38 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-12-13 23:10:36 UTC
(In reply to comment #37)
> I didn't find a "vote"
> feature, as the Mozilla db has.

Bottom right, "Vote for this bug" (Ctrl-F for "vote" would have found it).

> I don't understand the difference
> to bug 710, which is supposedly fixed. 

Bug 710 is about redirects to a category working when you navigate directly to the page, the same way they work for other pages.  Prior to that bug's resolution, I guess "#REDIRECT [[:Category:XYZ]]" would do nothing, or be buggy or something (it was before my time).  This is about redirects actually including one category's contents in another.
Comment 39 Philip Tzou 2009-02-03 02:27:34 UTC
may fixed in r46706.
Comment 40 Le Chat 2009-02-07 17:02:04 UTC
This seems not to be working (at least, I just tried it on English Wikipedia and it didn't work - I don't know if it's supposed to be live there yet).
Comment 41 Platonides 2009-02-07 17:05:40 UTC
Wikipedia is still on r46424, see Special:Version.
Comment 42 Le Chat 2009-02-12 12:47:45 UTC
I'm glad this is going to be solved soon, but there is the problem of potential exploitation by vandals. I've filed a new bug (bug 17461) to address this.
Comment 43 Le Chat 2009-02-18 15:01:37 UTC
Don't like to sound impatient, but when can we expect this fix to actually come live?
Comment 44 Le Chat 2009-02-19 17:01:25 UTC
OK, it is live, thanks. There still seems to be a slight problem, though, in that you can't get a list of members of the redirected category specifically. I've raised this in a new bug (bug:17571).
Comment 45 ipatrol 2009-04-03 22:10:44 UTC
It has been decided that the change will be made in the next mediawiki full release ( http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/RELEASE-NOTES?view=markup ) , so just be patient.☺
Comment 46 Tim Starling 2009-05-19 08:06:31 UTC
Reverted, see CodeReview r46706.
Comment 47 Le Chat 2009-05-19 09:50:26 UTC
Why has this potentially helpful change been reverted? It seemed to be working well; the only problem was bug 17571, which surely can't be difficult to fix. We know that the tables don't get updated straight away when a category is changed to/from a redirect, and I presume we wouldn't want them to. Bots would handle emptying existing categories when they get redirected, exactly as they do now.
Comment 48 Tim Starling 2009-05-19 11:47:17 UTC
(In reply to comment #47)
> Why has this potentially helpful change been reverted? It seemed to be working
> well; the only problem was bug 17571, which surely can't be difficult to fix.
> We know that the tables don't get updated straight away when a category is
> changed to/from a redirect, and I presume we wouldn't want them to. Bots would
> handle emptying existing categories when they get redirected, exactly as they
> do now.

The tables indeed were not updated straight away, in fact they were not updated at all, ever. You'd have to have a bot go through and edit every page in the category, every time the redirect status or redirect target changed.

It's possible to do these updates immediately, with negligible performance loss, and to retire the bots. But it would be much more difficult to implement that feature if the categorylinks table was significantly polluted with spurious links from r46706.
Comment 49 Le Chat 2009-05-19 12:34:36 UTC
I don't think it was ever envisaged that the tables would be updated automatically (I didn't think that was desirable anyway, since inappropriate redirects of large categories, and subsequent reversions, would cause lots of extra processing, of the sort that doesn't seem to happen when e.g. templates with categories get updated). But if you say it can be done, then we'll wait in eager anticipation...
Comment 50 Roan Kattouw 2009-05-19 13:49:18 UTC
(In reply to comment #49)
> I don't think it was ever envisaged that the tables would be updated
> automatically (I didn't think that was desirable anyway, since inappropriate
> redirects of large categories, and subsequent reversions, would cause lots of
> extra processing, of the sort that doesn't seem to happen when e.g. templates
> with categories get updated). But if you say it can be done, then we'll wait in
> eager anticipation...
> 

Templates with categories don't cause immediate updates because those updates are put in the job queue and executed later. Presumably, updating for category redirect changes would also use the job queue.
Comment 51 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-19 14:09:09 UTC
Templates with categories don't cause immediate updates because those updates require reparsing of large numbers of pages.  Category redirects don't, I don't see any reason why they should need the job queue.  Except for really giant categories, maybe, where you'd want to batch the updates to not lag the slaves.
Comment 52 Platonides 2009-05-22 18:02:37 UTC
Making a "normal" Category a category can be done straight away, but unredirecting a category requires reparsing all category members.
Comment 53 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-22 19:19:18 UTC
Or adding an extra column to categorylinks.  That seems like a better idea, unless un-redirecting is expected to be very rare.
Comment 54 Platonides 2009-05-22 22:55:58 UTC
That's probably the way to go. What would be that column?
Comment 55 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-05-24 13:56:46 UTC
cl_to_original or such, an unredirected variant of cl_to.  Then if a redirect chain changes, you could do UPDATE categorylinks SET cl_to='New_redirect_target' WHERE cl_to_original IN ('Original_category1', 'Original_category2');.  You'd want an index on cl_to_original, of course, so this is a pretty heavyweight addition to the table.
Comment 56 ipatrol 2009-07-11 21:37:34 UTC
I think that the best solution is you place [[A]] into [[Category:Foo]] and Foo redirects to Bar so you see [[A]] in [[Category:Bar]] and clicking on the catlink to Foo leads you to Bar. For commons they can have "co-categories" where a member of one co-category is visible in all other co-categories. This can be done by having all the categories have [[;Category:Fu]] [[;Category:Fuz]] [[;Category:Faz]] [[;Category:(...)]]  
Comment 57 Le Chat 2009-11-17 12:34:08 UTC
Hello, is anyone still working on this? Any progress lately? It all seemed to be going so well at one point...
Comment 58 Dan Jacobson 2010-04-01 04:38:15 UTC
Sorry my following observation is probably noted above, but I didn't check.

On [[Page A]] put "[[Category:C1]]".
Now on [[Category:C1]] put
"#REDIRECT [[Category:C2]]".

Note how Page A is not listed on Category:C2.
Instead the only way to hunt down Page A in the categories is to
visit Category:1&redirect=no !
Comment 59 Dan Jacobson 2010-04-01 04:40:51 UTC
(In reply to comment #58)
> visit Category:1&redirect=no !
I meant Category;C1&redirect=no. The redirect=no part is not something the average user will know to try. So the category entry is effectively lost in this sense.
Comment 60 p858snake 2011-04-30 00:10:09 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 61 Bawolff (Brian Wolff) 2011-11-08 21:13:23 UTC
*** Bug 32262 has been marked as a duplicate of this bug. ***
Comment 62 Sumana Harihareswara 2011-11-09 04:15:15 UTC
Adding the keywords that seem right -- if the patches still need reviewing, please change "reviewed" to "need-review".
Comment 63 Quim Gil 2013-03-23 18:24:50 UTC
This feature request is being proposed at

http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects

and I'm considering whether to add it or not to 

https://www.mediawiki.org/wiki/Summer_of_Code_2013#Project_ideas

Question:

Is there a potential mentor willing to help potential students interested in
this project?

Is there a reasonable support from the MediaWiki core maintainers to incorporate this feature if it's developed and meets the quality criteria?

Without these qualifications in place we can't even consider the proposal for
GSOC 2013.
Comment 64 Bawolff (Brian Wolff) 2013-03-23 18:26:43 UTC
(In reply to comment #63)

> Question:
> 
> Is there a potential mentor willing to help potential students interested in
> this project?

Yes me :)

> 
> Is there a reasonable support from the MediaWiki core maintainers to
> incorporate this feature if it's developed and meets the quality criteria?
> 

I think so. Would require schema changes which is the only bit that could potentially be sticky.
Comment 65 Quim Gil 2013-03-23 18:48:19 UTC
Ok, you're in:

https://www.mediawiki.org/wiki/Summer_of_Code_2013#Automatic_category_redirects

Thank you and good luck!
Comment 66 Tyler Romeo 2013-04-23 06:38:40 UTC
The more you know...

The current query for getting category members is:
SELECT ...
FROM `page`
INNER JOIN `categorylinks`
    FORCE INDEX (cl_sortkey)
    ON ((cl_from = page_id))
LEFT JOIN `category`
    ON ((cat_title = page_title) AND page_namespace = '14')
WHERE cl_to = 'Test' AND cl_type = 'page'
ORDER BY cl_sortkey
LIMIT 201

And, true enough, if you change the cl_to check from a comparison to an IN operator, it triggers a filesort. *However*, if you instead move the contents of the WHERE clause into the INNER JOIN condition, then the filesort disappears. The resulting query is:

SELECT ...
FROM `page`
INNER JOIN `categorylinks`
    FORCE INDEX (cl_sortkey)
    ON ((cl_from = page_id) AND (cl_to IN ('Test')) AND (cl_type = 'page'))
LEFT JOIN `category`
    ON ((cat_title = page_title) AND page_namespace = '14')
ORDER BY cl_sortkey
LIMIT 201

Now I'm not too much of an expert on databases, but theoretically this should produce the exact same results (since it's an INNER JOIN) but still be efficient (because the cl_sortkey index includes the cl_from and cl_to columns).

This would eliminate the need for any new columns and whatnot.
Comment 67 Quim Gil 2013-04-23 21:08:34 UTC
Just a note to say that Liangent has applied to GSoC with a proposal related to this report. Good luck!

https://www.mediawiki.org/wiki/User:Liangent/cat-redir
Comment 68 Bawolff (Brian Wolff) 2013-04-23 21:40:39 UTC
Re comment 66:

If I have more than a single category in the IN condition when doing that, I get a filesort:

mysql> describe SELECT /* CategoryViewer::doCategoryQuery Bawolff */  page_id,page_title,page_namespace,page_len,page_is_redirect,cl_sortkey,cat_id,cat_title,cat_subcats,cat_pages,cat_files,cl_sortkey_prefix,cl_collation  FROM `page` INNER JOIN `categorylinks` FORCE INDEX (cl_sortkey) ON ((cl_from = page_id) AND cl_to in ('Foo', 'se') and cl_type = 'page') LEFT JOIN `category` ON ((cat_title = page_title) AND page_namespace = '14')   ORDER BY cl_sortkey LIMIT 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: categorylinks
         type: range
possible_keys: cl_sortkey
          key: cl_sortkey
      key_len: 258
          ref: NULL
         rows: 559
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: wikidb.categorylinks.cl_from
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: category
         type: eq_ref
possible_keys: cat_title
          key: cat_title
      key_len: 257
          ref: wikidb.page.page_title
         rows: 1
        Extra: 
3 rows in set (0.00 sec)
Comment 69 Tyler Romeo 2013-04-24 05:13:29 UTC
Hmm, damn databases.
Comment 70 Tyler Romeo 2013-04-28 21:17:28 UTC
Success! So the issue is that the cl_sortkey index on categorylinks puts the cl_to column before the cl_sortkey column, so when you add the "cl_to IN ...", it can no longer use the index to sort by cl_sortkey (from the ORDER BY clause). 

After adding the following index:

ALTER TABLE `categorylinks`
ADD UNIQUE `cl_newsort` ( `cl_type`, `cl_sortkey`, `cl_to`, `cl_from` )

And then running the following query:

EXPLAIN EXTENDED SELECT `cl_from`
FROM `categorylinks`
INNER JOIN `page` ON
	`page_id` = `cl_from`
LEFT JOIN `category` ON
	`cat_title` = `page_title` AND
	`page_namespace` = 14
WHERE
	`cl_type` = 'page' AND
	`cl_to` IN ( 'Foo', 'Test' )
ORDER BY cl_sortkey

I finally got no more filesort. (I was even able to get rid of the FORCE INDEX usage.) If somebody could please check this and make sure I'm still sane, and that MySQL isn't just inventing things to trick my mind, that'd be great.
Comment 71 Bawolff (Brian Wolff) 2013-04-28 21:25:49 UTC
I havent tested this, but I would guess that unless it is doing something very fancy with merging indecies, this would cause very large scans of the categorylinks table. (Since it wouldn't be able to skip to only results in the relavent category). filesort isnt the only way that a db query can be inefficient.
Comment 72 Tyler Romeo 2013-04-28 21:33:20 UTC
(In reply to comment #71)
> I havent tested this, but I would guess that unless it is doing something
> very
> fancy with merging indecies, this would cause very large scans of the
> categorylinks table. (Since it wouldn't be able to skip to only results in
> the
> relavent category). filesort isnt the only way that a db query can be
> inefficient.

Hmm, you're right. Now that I realize it, this would require scanning the entire cl_sortkey index (I think).
Comment 73 Quim Gil 2013-05-03 21:56:06 UTC
Just a note to say that Andre Saboia has submitted a GSoC proposal related to this report: https://www.mediawiki.org/wiki/User:Anboia/Automatic_category_redirects
Comment 74 Gerrit Notification Bot 2013-05-24 07:44:29 UTC
Related URL: https://gerrit.wikimedia.org/r/65176 (Gerrit Change I29a629a514f9568d0ee4d967c516dfd599dc11ba)
Comment 75 Andre Klapper 2013-09-26 14:37:38 UTC
Tyler: The patch received a -1, do you plan to rework it?
Comment 76 Bawolff (Brian Wolff) 2013-09-26 14:44:53 UTC
If I ever have free time again (read: probably not for a while), I offer to help Tyler address some of the issues with the patch.
Comment 77 Tyler Romeo 2013-09-26 14:59:17 UTC
(In reply to comment #76)
> If I ever have free time again (read: probably not for a while), I offer to
> help Tyler address some of the issues with the patch.

That would be great. I don't have much free time myself, although once I do I'll definitely work on it.
Comment 78 Bawolff (Brian Wolff) 2013-09-26 15:00:34 UTC
(In reply to comment #77)
> (In reply to comment #76)
> > If I ever have free time again (read: probably not for a while), I offer to
> > help Tyler address some of the issues with the patch.
> 
> That would be great. I don't have much free time myself, although once I do
> I'll definitely work on it.

yeah, somebody should make a graph of number of commits to mediawiki by volunteers vs when school semester starts.
Comment 79 Quim Gil 2013-10-24 20:35:26 UTC
I'm delisting this project from https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Automatic_category_redirects since it looks like you are almost there.
Comment 80 Bawolff (Brian Wolff) 2013-10-24 20:37:48 UTC
Remove milestone 1.22 - Given that this has somewhat stalled due to lack of time on the part of interested parties, seems unlikely it could possibly make it to 1.22.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links