Last modified: 2010-05-15 15:41:15 UTC
Go ahead, in LocalSettings.php put
$wgNamespaceRobotPolicies = array(NS_WHATEVER => 'noindex,nofollow')
And then edit some [[Test page]] where you put a link to
[[Whatever:Something]]. And make sure Whatever:Something exists.
Now look at the rendering of [[Test page]].
Does the link to Whatever:Something have a nofollow added? No.
Should it? Yes
Does Whatever:Something have
<meta name="robots" content="noindex,nofollow" />
in its header? Yes.
Is that good enough? Perhaps, but why cause the search engine's extra
GET in the first place, only to laugh in its face "Ha ha ha, fooled
'ya, no food here!"?
Don't combine the several related bugs I sent today into one as in the
current software state, they are certainly fixed faster piecemeal.
What about http://meta.wikimedia.org/wiki/Robots.txt saying:
The only way to keep a URL out of Google's index is to let Google
crawl the page and see a meta tag specifying robots="noindex".
Although this meta tag is already present on the edit page HTML
template, Google does not spider the edit pages (because they are
forbidden by robots.txt) and therefore does not see the meta tag.
So hhmmmm, not sure.
P.S. I was going to use
$wgNamespaceRobotPolicies = array(
NS_SPECIAL => 'noindex,nofollow',
but as my site depends on search engines indexing Special:Allpages to
get its pages indexed, I will back off to just
NS_SPECIAL => 'noindex',
But wait, SpecialPage.php already has
$wgOut->setRobotPolicy( "noindex,nofollow" );
Indeed, I was expecting Special:Allpages to be the main way for all
search engines to index my site, as yes I also have lots of
categories, but the are empty pages. I suppose I must write zero bytes
into those categories to make them not "edit" links so they can be
OK, that's enough thinking for one day.