Last modified: 2009-08-31 19:44:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16900, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14900 - __INDEX__ and __NOINDEX__ should not override $wgArticleRobotPolicies
__INDEX__ and __NOINDEX__ should not override $wgArticleRobotPolicies
Product: MediaWiki
Classification: Unclassified
Page editing (Other open bugs)
All All
: Normal minor (vote)
: ---
Assigned To: Aryeh Gregor (not reading bugmail, please e-mail directly)
: patch, patch-need-review
Depends on: 8068
Blocks: 17004
  Show dependency treegraph
Reported: 2008-07-23 20:06 UTC by Aryeh Gregor (not reading bugmail, please e-mail directly)
Modified: 2009-08-31 19:44 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Updated patch, against r53416 (20.45 KB, patch)
2009-07-19 15:21 UTC, Happy-melon

Description Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-07-23 20:06:02 UTC
$wgArticleRobotPolicies allows site owners to set robot policies on a per-article basis.  __INDEX__ and __NOINDEX__ allow random users to do the same.  The owner's setting should win out here -- current __INDEX__/__NOINDEX__ do, because of the order the code is executed.  The config setting is executed in Article::view() -- in the same place as the namespace setting, which the magic words *should* override, since they're more specific (page-specific).  The magic words are handled in OutputPage::addParserOutputNoText(), which is lower-level and is run later.

I guess the best way to handle this would be to make OutputPage have a hierarchy of robot policies, so that Article could register a higher-priority robots setting that would override any later low-priority settings.  But that seems awfully inelegant for such a minor detail.
Comment 1 Siebrand Mazeland 2008-08-11 10:50:49 UTC
Assign to developer of the feature, accidentally also the reporter :).
Comment 2 Happy-melon 2009-01-13 14:21:12 UTC
Created attachment 5670 [details]
Patch to resolve indexing conflicts, on r45695

This adds an optional parameter to OutputPage::setIndexPolicy, the 'precedence' of the method that is trying to change the configuration.  The hierarchy is set up as 
*  5 = unset (initalised defaults as below)
*  4 = set by $wgDefaultRobotPolicy
*  3 = set by $wgNamespaceRobotPolicies
*  2 = set by __INDEX__ or __NOINDEX__ magic words (where allowed 
       by $wgExemptFromUserRobotsControl)
*  1 = set by $wgArticleRobotPolicies
*  0 = set 'on-the-fly' to hide things like special pages, old revisions, etc
Also rewrites OutputPage::setRobotsPolicy as a wrapper to use the new functions, and redefines all three as returning bool: whether the attempt to change the settings was successful, which should make it easier to resolve bug16979 cleanly (or at least *more* cleanly).  

Patch needs review and is UNTESTED on a live MediaWiki installation.
Comment 3 Happy-melon 2009-07-19 15:21:53 UTC
Created attachment 6366 [details]
Updated patch, against r53416

Updated patch. This moves all robots handling to a new Article::setRobotPolicyForView (reborn from Article::getRobotPolicyForView ), which uses array_merge to build a single policy from the various layers of config.  This has the nice freebie that you can now say something like:

$wgDefaultRobotPolicy = 'index, nofollow';
$wgNamespaceRobotPolicies[NS_USER] = 'noindex';

And the 'nofollow' attribute will be inherited from the default policy, which would be expected behaviour.  Currently the policy would be lost and the hardcoded default of 'follow' would be used... :(

The patch also cleans up Article::view() a little, to avoid the fourfold duplication of the do-this-when-the-body-has-been-constructed section; prompted because the call to setRobotPolicyForView() is moved there, so it has access to the parser output (which is now stored as a member variable $mParserOutput, rather than discarded) to check for __NOINDEX__ tags.  

I've also encapsulated the "do we allow __NOINDEX__ tags in this namespace" logic in Title::canUseNoindex(), so it can be called from both Article::setRobotPolicyForView() and the Parser. This makes it trivial to resolve bug16979, which I've done in this patch; the modifications to Parser.php aren't strictly necessary to resolve *this* bug, but it's quite a nice (and useful) change.  

My first efforts at the change involved using the page_props table, which might still be a good idea.  This led me to improve the documentation for $wgPagePropLinkInvalidations, which I've left in because at the moment it's totally rubbish.
Comment 4 Happy-melon 2009-08-31 19:44:36 UTC
Fixed in r55700

Note You need to log in before you can comment on or make changes to this bug.