Last modified: 2014-07-08 16:35:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T10473, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 8473 - $wgArticleRobotPolicies vs. SpecialPages hardwiring
$wgArticleRobotPolicies vs. SpecialPages hardwiring
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
1.11.x
All All
: Lowest enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 17004
  Show dependency treegraph
 
Reported: 2007-01-03 20:28 UTC by Dan Jacobson
Modified: 2014-07-08 16:35 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dan Jacobson 2007-01-03 20:28:21 UTC
Special:Allpages would be a great page to let search engines crawl,
for smaller sites.

Allow me to make the case that one should be able to make
Special:Allpages spiderable. Currently it is _hardwired_
noindex,nofollow, just like the other Special pages,
$wgNamespaceRobotPolicies won't help as it is hardwired in
SpecialSpecialpages.php and even if $wgNamespaceRobotPolicies could be
used, one would like to limit the granularity to just Special:Allpages
and keep the rest of Special: set to noindex,nofollow.

Consider http://radioscanningtw.jidanni.org/
On the Main page the first link I make is to
http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages
expecting users and search engines alike to use it.

Sure, other wikis might have a vibrant tree of information. However
http://radioscanningtw.jidanni.org/ is more of a flat list, with many
categories that don't need pages just to say they represent e.g.,
486.3785 MHz. I like my structure, and users can see all the content,
but search engines can't! Anyways,
http://radioscanningtw.jidanni.org/index.php?title=Special:Allpages
would have been the perfect way to get it indexed, were it not for
some assumption that all Special pages should be noindex,nofollow. No
I do not wish to maintain my own private version of
SpecialAllpages.php, I'm just giving an observation.
Comment 1 Brion Vibber 2007-01-04 07:42:01 UTC
Dynamic special pages are in general pretty crappy for spidering and will remain
generally disabled.

Consider using sitemap generation.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-01-04 07:47:17 UTC
Why?  noindex,follow for Allpages strikes me as sensible, even if not as useful
as a site map.
Comment 3 Dan Jacobson 2007-08-22 23:33:23 UTC
http://radioscanningtw.jidanni.org/index.php?title=Template_talk:Robots_temp is my inclusion hack workaround.
Comment 4 Dan Jacobson 2007-12-26 01:42:20 UTC
One could now set the new
$wgArticleRobotPolicies=array('Special:Allpages'=>'noindex,follow');
but apparently Special pages are too hardwired for the weak $wgArticleRobotPolicies to overpower them!
See also Bug 9145.
Comment 5 Dan Jacobson 2007-12-27 02:01:20 UTC
(I am removing the above mentioned Template_talk:Robots_temp. It contained
  ==[[Special:Allpages/]]==
  {{Special:Allpages/}}
  ==[[Special:Allpages/Project:]]==
  {{Special:Allpages/Project:}}
)
Comment 6 Dan Jacobson 2009-02-19 19:09:49 UTC
http://perishablepress.com/press/2008/06/03/taking-advantage-of-the-x-robots-tag/ mentions methods perhaps useful to  people seeking workarounds for this bug.
Comment 7 jeckyhl 2012-09-05 23:01:05 UTC
Quick and dirty (?) solution :

in SpecialPage.php, method setHeaders(), replace

  $out->setRobotPolicy( "noindex,nofollow" );

with

  global $wgNamespaceRobotPolicies;
  $ns = $this->getTitle()->getNamespace();
  if ( isset( $wgNamespaceRobotPolicies[$ns] ) ) {
     $policy = $wgNamespaceRobotPolicies[$ns];
  } else {
     $policy ='noindex,nofollow';
  }
  $out->setRobotpolicy( $policy );

This keeps the 'noindex,nofollow' setting as default, but it can be overriden in LocalSettings.php, e.g.

$wgNamespaceRobotPolicies[NS_SPECIAL] = 'noindex,follow'
Comment 8 Andre Klapper 2014-07-08 16:35:17 UTC
Likely a WONTFIX as per comment 1. Lowering priority to reflect reality...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links