Last modified: 2011-12-13 15:26:57 UTC
For transparency, pages using the __NOINDEX__ and __INDEX__ behavior switches should be auto-categorised into a tracking category a la [[Category:Hidden categories]] for __HIDDENCAT__. Ideally, this should only occur when the switch is actually having an *effect* - ie, only where the switch is allowed by $wgNamespaceRobotPolicies and $wgArticleRobotPolicies. This would achieve the double purpose of allowing users to see if the switch is having an effect, and allowing the use of the switches to be monitored.
Created attachment 5667 [details] Proposed patch I've taken a stab at trying to do this. __INDEX__ includes the page in [[Category:Indexed pages]] and __NOINDEX__ includes the page in [[Category:Non-indexed pages]].
Created attachment 5668 [details] Define category names
This categorises even if the action of __INDEX__/__NOINDEX__ is disabled by $wgExemptFromUserRobotsControl, doesn't it? From the fixmes in OutputPage.php, looks like the whole thing could do with an overhaul.
Created attachment 5674 [details] Proposed patch New patch checks if the namespace is ExemptFromUserRobotsControl. If it is then Parser.php does not add the category or setIndexPolicy, which (I think) makes the check on OutputPage.php redundant.
Created attachment 5675 [details] Factoring in ArticleRobotPolicies I could kill two birds with one stone here. It should now work like this; 1) Check if the page has a policy defined in $wgArticleRobotPolicies - if it does not code will be executed so the page will not be added to the category and the new Index/Noindex policy will not be set. 2) If not then check $wgExemptFromUserRobotsControl - if the namespace has a local policy then the policy will not be set. 3) If not, check if NOINDEX/INDEX tags are in use 4) If so then add it to the appropriate category and set the policy. This is the first time I've really played around with MediaWiki's code so I don't know if it will work as intended but this should also solve the problem of NOINDEX/INDEX overriding a policy set in $wgArticleRobotPolicies.
Created attachment 5710 [details] Proposed patch v4 Tidied up the code a little.
Created attachment 6108 [details] New patch Some improvements
Wouldn't it be a better idea to track this stuff in the page_props table, like we do with __HIDDENCAT__ ?
__HIDDENCAT__ also adds the page to [[Category:Hidden categories]].
*YES*. This is the *perfect* solution. The situation is very similar, it's a 'property' that applies to individual pages that can be stored coherently in the page_props table, and the db query can be done in OutputPage.php rather than the parser. Is [[Category:Hidden categories]] populated 'normally', with links in the categorylinks table? Or is it generated entirely from page_props? There's probably no reason why a [[Category:Noindexed pages]] can't be dynamically-generated; it would additionally allow the categorisation to be filtered by __NOINDEX__ tags that are functional (are suppressing indexing) and those that are not (ie are being overridden by other policies). This would make resolving bug14900 very much easier, as well. Great idea, Roan!
[[Category:Hidden categories]] is populated using the categorylinks table. My patch resolves bug14900 anyway (if the page is in $wgArticleRobotPolicies then NOINDEX/INDEX have no effect) but perhaps using page_props would be better.
done in r56688.