Last modified: 2012-12-27 17:00:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20254, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18254 - inherent article classification
inherent article classification
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2009-03-30 12:06 UTC by bluehairedlawyer
Modified: 2012-12-27 17:00 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description bluehairedlawyer 2009-03-30 12:06:24 UTC
There are a lot of bugs here which in some way related to marking out links to disambiguation and stub articles. We already have ways of telling when a link is to a non-existent page or to a redirect, so stubs and redirects shouldn't be that different. Here's how I propose to implement it:

Add a "class" column to the "page" table.

Add a new magic word to allow editors to set the class for a particular page. It would not be possible for pages to be in more than one class. (That would involve setting up a new table, would be rejected for performance reasons and IMHO would be unnecessary.)

The magic word would look like this {{class: stub}} and when the given class corresponded to one set in the LocalSettings.php, the parser would set the articles class column value (whenever the value was changed in a edit).

A related setting variable would determine whether putting a non-existent class in the {{class: }} magic word would produce an error.

The linker could set a css class for links based on the article's class.

Special pages (perhaps even category pages) would then easily be able to hide or show pages based on their class.
Comment 1 Roan Kattouw 2009-03-30 15:31:44 UTC
Perhaps this stuff could be administrated in the page_props table?
Comment 2 bluehairedlawyer 2009-04-01 14:50:35 UTC
I'm not sure about that. Using the page_props table would probably result in multiplying the amount of sql requests every time a page is parsed. 

The idea here is that a page's class would be available in other pages, so that links could be colour coded appropriately. In effect we already have 4 classes:

1. Non-existent articles (coloured red on enwiki)
2. Redirects (ignored by default but editors can set a colour if they want)
3. Stubs - the software guess what's a stub based on a page's size (depends on user settings)
4. Existing articles that are neither 2 nor 3 (blue, purple if visited)

But defining stubs based on an article's size is a sub-optimal behaviour which means that short articles and disambiguation pages are all dumped together as "stubs". It also depends on the size threshold a user defines a stub as.

My proposal would mean the software could easily identify actual stubs, rather than having to second guess based on page size. We could also introduce new classes to highlight articles needing improvement and so on.

If disambiguation page were specially highlighted much less disambiguation pages would be linked to and the process of disambiguating links would be made much easier.

I've changed the component to "page rendering" as it mostly refers to linking. I'm currently working on a patch for this. I'd appreciate any feed-back on whether it would ever be implemented)
Comment 3 Roan Kattouw 2009-04-02 10:24:55 UTC
(In reply to comment #2)
> I'm not sure about that. Using the page_props table would probably result in
> multiplying the amount of sql requests every time a page is parsed. 
I don't see how that would happen. If all these things would reside in page_props, you'd just need one request to pull all properties of all linked-to pages once, similar to how we pull the data for all linked-to pages from the page table in one query (although we do have a batch limit on that, I believe it's 500, see LinkBatch.php).
Comment 4 bluehairedlawyer 2012-04-04 15:35:40 UTC
Maybe I can explain this better.

At the moment we have a 'page_is_redirect' column in the 'page' table allowing us to quickly identify redirects in pages which link to them. At the moment this column is simply:

0 = Normal Article
1 = Redirect

I propose to rename 'page_is_redirect' to 'class' and allow other values. For example:

0 = Normal Article
1 = Redirect
2 = Stub
3 = Disambiguation page

And possible others. Obviously an article could only be in any one category at the same time, but I don't think there could be any page in more than one of the examples given.

This would allow for colour coding for stubs (currently possible but highly inaccurate) and disambiguation page. And hopefully without increasing server overhead.
Comment 5 Platonides 2012-04-04 19:17:13 UTC
How is it "highly inaccurate"? It's available right to the byte, with an offset which you can change in your preferences, too.
Comment 6 bluehairedlawyer 2012-04-05 11:50:12 UTC
It is of course highly accurate to the byte but that's not how we decide what counts as a stub. Lots of articles are small without being stubs, disambiguation pages being the obvious example.

At the moment a registered editor looking a page can tell which links are to small articles but not what kind of articles they are. A link to a stub is good but a link to a disambiguation is normally evil. At the moment the only well of telling the difference is by following the link or having some javascript fetch the page for you.

Note You need to log in before you can comment on or make changes to this bug.