Last modified: 2009-08-31 19:44:36 UTC
$wgArticleRobotPolicies allows site owners to set robot policies on a per-article basis. __INDEX__ and __NOINDEX__ allow random users to do the same. The owner's setting should win out here -- current __INDEX__/__NOINDEX__ do, because of the order the code is executed. The config setting is executed in Article::view() -- in the same place as the namespace setting, which the magic words *should* override, since they're more specific (page-specific). The magic words are handled in OutputPage::addParserOutputNoText(), which is lower-level and is run later.
I guess the best way to handle this would be to make OutputPage have a hierarchy of robot policies, so that Article could register a higher-priority robots setting that would override any later low-priority settings. But that seems awfully inelegant for such a minor detail.
Assign to developer of the feature, accidentally also the reporter :).
Created attachment 5670 [details]
Patch to resolve indexing conflicts, on r45695
This adds an optional parameter to OutputPage::setIndexPolicy, the 'precedence' of the method that is trying to change the configuration. The hierarchy is set up as
* 5 = unset (initalised defaults as below)
* 4 = set by $wgDefaultRobotPolicy
* 3 = set by $wgNamespaceRobotPolicies
* 2 = set by __INDEX__ or __NOINDEX__ magic words (where allowed
* 1 = set by $wgArticleRobotPolicies
* 0 = set 'on-the-fly' to hide things like special pages, old revisions, etc
Also rewrites OutputPage::setRobotsPolicy as a wrapper to use the new functions, and redefines all three as returning bool: whether the attempt to change the settings was successful, which should make it easier to resolve bug16979 cleanly (or at least *more* cleanly).
Patch needs review and is UNTESTED on a live MediaWiki installation.
Created attachment 6366 [details]
Updated patch, against r53416
Updated patch. This moves all robots handling to a new Article::setRobotPolicyForView (reborn from Article::getRobotPolicyForView ), which uses array_merge to build a single policy from the various layers of config. This has the nice freebie that you can now say something like:
$wgDefaultRobotPolicy = 'index, nofollow';
$wgNamespaceRobotPolicies[NS_USER] = 'noindex';
And the 'nofollow' attribute will be inherited from the default policy, which would be expected behaviour. Currently the policy would be lost and the hardcoded default of 'follow' would be used... :(
The patch also cleans up Article::view() a little, to avoid the fourfold duplication of the do-this-when-the-body-has-been-constructed section; prompted because the call to setRobotPolicyForView() is moved there, so it has access to the parser output (which is now stored as a member variable $mParserOutput, rather than discarded) to check for __NOINDEX__ tags.
I've also encapsulated the "do we allow __NOINDEX__ tags in this namespace" logic in Title::canUseNoindex(), so it can be called from both Article::setRobotPolicyForView() and the Parser. This makes it trivial to resolve bug16979, which I've done in this patch; the modifications to Parser.php aren't strictly necessary to resolve *this* bug, but it's quite a nice (and useful) change.
My first efforts at the change involved using the page_props table, which might still be a good idea. This led me to improve the documentation for $wgPagePropLinkInvalidations, which I've left in because at the moment it's totally rubbish.
Fixed in r55700