Last modified: 2014-07-08 16:35:17 UTC
Special:Allpages would be a great page to let search engines crawl, for smaller sites. Allow me to make the case that one should be able to make Special:Allpages spiderable. Currently it is _hardwired_ noindex,nofollow, just like the other Special pages, $wgNamespaceRobotPolicies won't help as it is hardwired in SpecialSpecialpages.php and even if $wgNamespaceRobotPolicies could be used, one would like to limit the granularity to just Special:Allpages and keep the rest of Special: set to noindex,nofollow. Consider http://radioscanningtw.jidanni.org/ On the Main page the first link I make is to http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages expecting users and search engines alike to use it. Sure, other wikis might have a vibrant tree of information. However http://radioscanningtw.jidanni.org/ is more of a flat list, with many categories that don't need pages just to say they represent e.g., 486.3785 MHz. I like my structure, and users can see all the content, but search engines can't! Anyways, http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages would have been the perfect way to get it indexed, were it not for some assumption that all Special pages should be noindex,nofollow. No I do not wish to maintain my own private version of SpecialAllpages.php, I'm just giving an observation.
Dynamic special pages are in general pretty crappy for spidering and will remain generally disabled. Consider using sitemap generation.
Why? noindex,follow for Allpages strikes me as sensible, even if not as useful as a site map.
http://radioscanningtw.jidanni.org/index.php?title=Template_talk:Robots_temp is my inclusion hack workaround.
One could now set the new $wgArticleRobotPolicies=array('Special:Allpages'=>'noindex,follow'); but apparently Special pages are too hardwired for the weak $wgArticleRobotPolicies to overpower them! See also Bug 9145.
(I am removing the above mentioned Template_talk:Robots_temp. It contained ==[[Special:Allpages/]]== {{Special:Allpages/}} ==[[Special:Allpages/Project:]]== {{Special:Allpages/Project:}} )
http://perishablepress.com/press/2008/06/03/taking-advantage-of-the-x-robots-tag/ mentions methods perhaps useful to people seeking workarounds for this bug.
Quick and dirty (?) solution : in SpecialPage.php, method setHeaders(), replace $out->setRobotPolicy( "noindex,nofollow" ); with global $wgNamespaceRobotPolicies; $ns = $this->getTitle()->getNamespace(); if ( isset( $wgNamespaceRobotPolicies[$ns] ) ) { $policy = $wgNamespaceRobotPolicies[$ns]; } else { $policy ='noindex,nofollow'; } $out->setRobotpolicy( $policy ); This keeps the 'noindex,nofollow' setting as default, but it can be overriden in LocalSettings.php, e.g. $wgNamespaceRobotPolicies[NS_SPECIAL] = 'noindex,follow'
Likely a WONTFIX as per comment 1. Lowering priority to reflect reality...