Last modified: 2014-11-04 22:49:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8754, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 6754 - Distinguish disambiguation pages from normal articles cheaply in database
Distinguish disambiguation pages from normal articles cheaply in database
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Extensions requests (Other open bugs)
unspecified
All All
: High normal with 10 votes (vote)
: ---
Assigned To: Ryan Kaldari
:
: 43210 (view as bug list)
Depends on: 35981
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-20 09:42 UTC by Julian Fleischer
Modified: 2014-11-04 22:49 UTC (History)
19 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Julian Fleischer 2006-07-20 09:42:55 UTC
It is already implemented into MediaWiki that Redirects are not counted as "true articles". I'm requesting the feature 
to distinguish Disambiguation-Pages similarily. Not because of statstics but because of something else. In detail it 
is about the Special-Pages "Lonelypages". Most of the Disambiguation-Pages are Lonelypages. That's a matter of fact 
due to the fact that a link to a certain topic is mostly directed at the disambigued lemma and not the 
Disambiguation-Page, which is fully right the way it is done. But if you are now seeking for Lonelypages via the 
Special-Page for it you will find a lot of Disambiguation-Pages which do not neet to be linked by other articles as 
they are just Disambiguation-Pages for users who have just typed the expression seeking for an explenation. This cost 
a lot of my nerves.

And it's not only about my nerves, there's another thing why I think that this feature should be implemented. It still 
has to do with the Lonelypages.. As I mentioned you can use this tool to search for Lonelypages and link them then. 
But - as most of the special pages - these pages are cached pages (which is annoying but necessary - and not the 
point) and mostly queries limites to 1000 entries. In other words: There are very many lonelypages. More than 1000. So 
they can not be found via the Lonelypages-Tool. Mostly you will only get those pages beginning with A or B. Okay, no 
problem, you might think... You can link all the A and B pages so that at the next query C and D will come up.. But 
it's exactly that what won't work, because of the Disambiguation-Pages. There are also more than 1000 
Disambiguation-Pages, which, as we just found out, do not need to be linkes, so they won't disappear from that list 
and block C, D etc.

A work-around now would be to create a page.. maybe. "List of all Disambiguations" which links those pages to make 
them disappear, but i guess the Tools considers pages to be lonely if no article links to them.. And "list of all 
disambiguations" is not a page to put into the article-namespace as it has no encyclopaedic relevance.

I hope you got the point even due to a few mistakes in my english :-)
Comment 1 Julian Fleischer 2006-07-20 23:44:20 UTC
"if no article links to them" - what i mean is no page from the article namespace.. i guess the most of you will have figured out this already, 
but just to make sure everybody understands ; )
Comment 2 Duncan Harris 2006-10-14 15:50:08 UTC
AFAIK all disambiguation pages should start with

#DISAMBIGUATION

so to be recognised as different

so that then can be implemented:

[[special:whatlinkshere]] can identify disambiguation pages.

and I'm sure there are other advantages that I have not thought, but the
#DISAMBIGUATION thing needs to go in first.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-10-16 00:14:11 UTC
Disambiguation pages are distinguished technically from non-disambigs.  See
[[Special:Disambiguations]].  Any particular requests about what should be done
with disambiguation pages should be in separate bugs.
Comment 4 Duncan Harris 2006-11-11 12:23:04 UTC
No.  This is clearly not fixed. Examples of where disambiguation pages need to
be distinguished include:

# In [[special:whatlinkshere]], pages that are disambiguation pages should be
identified.
# [[special:randompage]] should not take users to disambiguation pages.
# [[special:allpages]] should identify disambiguation pages from articles.
Comment 5 Rob Church 2006-11-11 13:01:53 UTC
These three issues should be separate feature requests.
Comment 6 Duncan Harris 2006-11-11 23:28:08 UTC
And so would it be possible to do this without any technical alterations?  Just
by using a {{template}} ??
Comment 7 Rob Church 2006-11-11 23:31:16 UTC
Administrators provide a list of disambiguation pages via a page in the
MediaWiki namespace. Therefore, MediaWiki knows what pages are supposed to be
classed as disambiguation pages. To request special treatment for disambiguation
page links, etc. in certain cases, please file requests to have that done.

Do not reopen this bug, which concerned something *else*.
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-11-17 02:23:57 UTC
Okay, after looking at the code I'm no longer sure about this being resolved. 
[[Special:Disambiguations]] uses an isExpensive() database query to pick out a
list of the disambigs, but I really don't see how that's usable for bug 7935,
bug 7936, bug 7937, et al.  What we need for those is a cheap and easy method
like Article::isDisambig(), à la Article::isRedirect().
Comment 9 Titoxd 2006-11-17 02:45:44 UTC
Perhaps adding a boolean marker on the database (or a different disambiguation
table), which is updated via a hook after a page save or a purge would work.
That way, Article::isDisambig() would just make a quick query to the field or
table, and return a simple yes/no, which then can be used accordingly.
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-11-09 17:16:59 UTC
I've looked at the query and actually we store it efficiently for single-page lookups.  This query is fast even on enwiki (using toolserver):

mysql> EXPLAIN SELECT 1 FROM templatelinks WHERE tl_namespace=10 AND tl_title IN ('Bio-dab', 'Dab', 'Diasmbig', 'Disamb', 'Disamb-cleanup', 'Disambig', 'Disambig-cleanup', 'Disambiguation', 'Geodis', 'Hndis', 'Hndisambig', 'Numberdis', 'Roaddis', 'Surname') AND tl_from=1234;
+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+
| id | select_type | table         | type | possible_keys        | key     | key_len | ref         | rows | Extra                    |
+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | templatelinks | ref  | tl_from,tl_namespace | tl_from | 8       | const,const |    3 | Using where; Using index | 
+----+-------------+---------------+------+----------------------+---------+---------+-------------+------+--------------------------+
1 row in set (0.01 sec)

mysql> SELECT 1 FROM templatelinks WHERE tl_namespace=10 AND tl_title IN ('Bio-dab', 'Dab', 'Diasmbig', 'Disamb', 'Disamb-cleanup', 'Disambig', 'Disambig-cleanup', 'Disambiguation', 'Geodis', 'Hndis', 'Hndisambig', 'Numberdis', 'Roaddis', 'Surname') AND tl_from=1234;
Empty set (0.01 sec)

which verifies that article id 1234 is not a disambig.  Resolving INVALID and removing dependencies; there's no problem with doing an extra JOIN for whatever queries you want, AFAICT.  Special:Disambiguations is, I think, only slow because it needs a filesort for the alphabetization?  Actually the query without alphabetization takes a couple of seconds to run on the toolserver, with an appropriate LIMIT, but I suspect that might be because the toolserver is overloaded.
Comment 11 Jason Spiro 2008-01-22 17:07:31 UTC
Julian, I respectfully disagree with something you wrote.  You said that disambig pages should not be shown on the Lonelypages list.  But as en:user:Revolving_Bugbear wrote: "Disambiguation pages should not be orphans, otherwise they would serve no purpose. Disambigs get wikilinked from hatnotes." --http://en.wikipedia.org/wiki/Wikipedia_talk:Special:Lonelypages#.22Except_for_disambiguation_pages_....22
Comment 12 Jason Spiro 2008-01-22 17:16:00 UTC
Julian, you said in your original comment that disambig pages should not be shown on the Lonelypages list.  That is an old bug, bug 3483 (Disambiguation pages should not be listed in Special:Lonelypages).  :-)
Comment 13 bluehairedlawyer 2012-04-04 17:50:25 UTC
I can't see how this bug is either resolved or invalid. Surely it can't be both!? It seems like it was more of a won't fix. I'm reopening it. This feature would be very useful for disambiguation.

To respond to some points made above:

We could create a magic word called __DISAMBIG__ . When included in a non-template page, the parser would set a field called 'page_class' in the 'page' table to a number indicating that it was a disambiguation page. This could then be used to colour links to disambiguation pages like we do now with redirects.

I've set out my proposal at Bug 18254.
Comment 14 Tim Landscheidt 2012-12-24 16:02:20 UTC
*** Bug 43210 has been marked as a duplicate of this bug. ***
Comment 15 Ryan Kaldari 2012-12-25 00:38:50 UTC
I implemented a 2 line fix for this at https://gerrit.wikimedia.org/r/#/c/40343/

It sets a 'disambiguation' page property for any page that includes '__DISAMBIG__'
Comment 16 Daniel Friesen 2012-12-25 00:50:40 UTC
We already have a method of identifying a page as a disambig page in core (with a category). And querying for this is already efficient. I see no reason for us to have to add extra columns or unnecessary magic words when this data is already inside the database.
Comment 17 Betacommand 2012-12-25 00:53:31 UTC
Actually you are mistaken, it uses templates listed on [[MediaWiki:Disambiguationspage]] which does not work well, I would suggest using a parser function like what Ryan added along with a new column in the page table.
Comment 18 Ryan Kaldari 2012-12-25 00:56:55 UTC
@Daniel: That solution doesn't work across wikis. It also isn't efficient since it requires hundreds of administrators to maintain hundreds of special lists in MediaiWiki space. This solution is simple and lightweight; it doesn't involve any extra columns, just a simple magic word to add to the disambiguation templates.
Comment 19 Betacommand 2012-12-25 00:58:45 UTC
I would suggest adding a page_is_disambig to the DB as magic words do not work well for database queries as they are not stored in the db, they just effect page rendering
Comment 20 Ryan Kaldari 2012-12-25 01:05:37 UTC
@Daniel: Well it technically does work across wikis, but in a very hackish way. The existing solution is very painful.

@Betacommand: That's why I use a doubleunderscore magic word, not a regular magic word.
Comment 21 Platonides 2012-12-25 01:17:57 UTC
I'm not opposed to storing that it is a disambiguation page in the page properties, but it should be detected via the existing [[MediaWiki:Disambiguationspage]], not with a new magic word.
Comment 22 Ryan Kaldari 2012-12-25 01:20:48 UTC
Since doubleunderscore magic words are especially magical (and have very little documentation), I guess I should explain how my patch actually works and what it does...

Unlike regular magic words, doubleunderscore magic words don't necessarily output anything. For example, you can put one on a line by itself and it won't effect the page rendering at all (even with the newline). The only thing they do by default is set a page property for any page that includes it. For example, when a category includes '__HIDDENCAT__' that just sets a page property on the category which can then be queried using Parser::getProperty(). It does this through the existing page_props table so no new columns or tables are necessary. In fact no schema change is needed at all. This is exactly the sort of use that the page_props table was intended for and exactly the sort of use that doubleunderscore magic words were intended for. There's no need to over-engineer this with new extensions, hooks, or schema changes.
Comment 23 Ryan Kaldari 2012-12-25 01:23:58 UTC
@Platonides: Why? If we have a simple efficient way to detect them why would we want to use a complicated fragile method?
Comment 24 Ryan Kaldari 2012-12-25 01:34:05 UTC
In case it's not obvious, the way this would work is that we would add the magic word into the disambiguation templates. Then we wouldn't have to keep track of all of these templates via the MediaWiki pages and we would be able to query the state with a single simple function call. I don't understand why this simple solution is so controversial (or why no one has bothered to implement it for 6 years). The existing solution is a terrible hack and I doubt if most wikis are even utilizing it.
Comment 25 Betacommand 2012-12-25 01:40:12 UTC
I still think a new columm in the page table can be very useful, that then can be used for not only this issue, but many other future features, (removing disambig from Special Random, excluding from article counts and several other ideas just off the top of my head) If this is done just via a magic word we lose a lot of the future features that could be built off this change.
Comment 26 Ryan Kaldari 2012-12-25 01:49:48 UTC
@Betacommand: Page properties are stored in the database. Thus, any piece of MediaWiki code has access to the information. You don't have to query the database directly though, you can just use Parser::getProperty('disambiguation'). There is no reason we need to create a new column for this. Also, any solution that requires a schema change will be about 100x less likely to get deployed.
Comment 27 Ryan Kaldari 2012-12-25 01:54:52 UTC
Sorry I meant ParserOutput::getProperty.
Comment 28 Daniel Friesen 2012-12-25 03:52:09 UTC
Strange, I could have sworn that we used a message to point to a category name. Not extract template links from.
Comment 29 Krinkle 2012-12-25 04:37:25 UTC
Does the core handling for MediaWiki:Disambiguationspage also set this property? It should, right?
Comment 30 Tim Landscheidt 2012-12-25 05:28:06 UTC
(In reply to comment #29)
> Does the core handling for MediaWiki:Disambiguationspage also set this
> property? It should, right?

[[MediaWiki:Disambiguationspage]] seems only to be queried on [[Special:Disambiguations]] where pages with the property disambiguation will probably only be duplicates.  If you mean if code should be added that if on page save a page transcludes a template contained in [[MediaWiki:Disambiguationspage]] the page property disambiguation is set, hell, no :-).  Just add the magic word to the templates, marvel at the nice and easily understandable and maintainable code (not only from the PHP, but also from the wiki side), and when everybody feels comfortable, get rid of [[MediaWiki:Disambiguationspage]].
Comment 31 Ryan Kaldari 2012-12-25 06:32:21 UTC
I killed the gerrit change since apparently no one wants disambiguation code in core. I'll see about writing an extension instead.
Comment 32 Nemo 2012-12-25 11:34:39 UTC
(In reply to comment #31)
> I killed the gerrit change since apparently no one wants disambiguation code
> in
> core. I'll see about writing an extension instead.

Moving to MediaWiki extension to avoid closing it WONTFIX.
Comment 33 Krinkle 2012-12-25 15:39:15 UTC
(In reply to comment #30)
> (In reply to comment #29)
> > Does the core handling for MediaWiki:Disambiguationspage also set this
> > property? It should, right?
> 
> [[MediaWiki:Disambiguationspage]] seems only to be queried on
> [[Special:Disambiguations]] where pages with the property disambiguation will
> probably only be duplicates.  If you mean if code should be added that if on
> page save a page transcludes a template contained in
> [[MediaWiki:Disambiguationspage]] the page property disambiguation is set,
> hell, no :-).  Just add the magic word to the templates, marvel at the nice
> and
> easily understandable and maintainable code (not only from the PHP, but also
> from the wiki side), and when everybody feels comfortable, get rid of
> [[MediaWiki:Disambiguationspage]].

Whether or not the handling of MediaWiki:Disambiguationspage uses this property isn't a problem. We can't/shouldn't migrate internals fully to this property as it would require a re-parse of every page.

It could be useful to set the property on-parse, but that's for a later point in time.

However introducing this property is pointless if disambiguation-queries don't use it. If Special:Disambiguations exclusively uses MediaWiki:Disambiguationspage, then this property is just a meaningless property that happens to be named "disambiguation".
Comment 34 Tim Landscheidt 2012-12-25 16:57:50 UTC
(In reply to comment #33)
> [...]
> Whether or not the handling of MediaWiki:Disambiguationspage uses this
> property
> isn't a problem. We can't/shouldn't migrate internals fully to this property
> as
> it would require a re-parse of every page.

No, it doesn't.  The addition of the magic word to the template triggers the setting of the property in the pages that transclude the template (at least according to my tests).  So the load isn't any different than that from other changes to the templates.

> [...]
> However introducing this property is pointless if disambiguation-queries
> don't
> use it. If Special:Disambiguations exclusively uses
> MediaWiki:Disambiguationspage, then this property is just a meaningless
> property that happens to be named "disambiguation".

I think its introduction is a fine example for the procedure we follow for schema changes as well: Small steps that can be rolled back at any time.  Even in the initial stage, the property is not pointless as it can far more easily be queried by the gadgets.  For example, [[de:MediaWiki:Gadget-bkl-check.js]] doesn't use the "official" [[MediaWiki:Disambiguationspage]], but relies on that the templates on the German Wikipedia add a category "Begriffsklärung".  With a property "disambiguation", this gadget can be used globally with a little change and no knowledge of the local category names.
Comment 35 Ryan Kaldari 2012-12-25 23:10:20 UTC
I have rewritten Special:Disambiguations as Special:Disambiguator. The output is exactly the same, but it uses the page property rather than the complicated multistep queries. This new page will be checked in as soon as I can get a new gerrit project created. The old page will be retired as soon as everyone is happy with using the new page. Besides, once the information is easily available from the database, I don't know why anyone would want to scrape a special page instead.
Comment 36 Ori Livneh 2012-12-25 23:36:44 UTC
Change 40349, a dependency for this bug, has been merged.
Comment 37 Waldir 2012-12-27 16:56:11 UTC
(In reply to comment #36)
> Change 40349, a dependency for this bug, has been merged.

For convenience: https://gerrit.wikimedia.org/r/#/c/40349/ (Creating new GetDoubleUnderscoreIDs hook)
Comment 38 MZMcBride 2012-12-28 22:11:04 UTC
(In reply to comment #35)
> I have rewritten Special:Disambiguations as Special:Disambiguator. The
> output is exactly the same, but it uses the page property rather than the
> complicated multistep queries. This new page will be checked in as soon
> as I can get a new gerrit project created. The old page will be retired
> as soon as everyone is happy with using the new page. Besides, once the
> information is easily available from the database, I don't know why
> anyone would want to scrape a special page instead.

This is <https://gerrit.wikimedia.org/r/41043>.
Comment 39 Dmytro Dziuma 2013-06-25 13:31:44 UTC
Isn't it fixed with Extension:Disambiguator ?
Comment 40 Helder 2013-06-25 13:43:18 UTC
(In reply to comment #39)
> Isn't it fixed with Extension:Disambiguator ?

I think so and opened bug 50174 requesting it to be installed on WMF wikis.
Comment 41 Ryan Kaldari 2013-09-05 21:55:27 UTC
The Disambiguator extension is installed on all Wikimedia wikis now. For instructions on how to use it, see https://www.mediawiki.org/wiki/Extension:Disambiguator#Usage.
Comment 42 Bartosz Dziewoński 2013-09-06 10:47:13 UTC
FYI, specific issues and enhancements are now tracked in the Disambiguator component: https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=%3Adisambiguator

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links