Last modified: 2009-05-20 13:14:35 UTC
I can't believe Google is showing Special:Random as the second highest
match for a phrase at my wiki.
You perhaps will argue "Jacobson, check your marbles, we send HTTP/1.1
302 Found and Location: ..."
Well there was my page, in their index but with
...title=Special:Random as that page's URL. That's all I know.
The problem is in the navigation on each and every page, at
<li id="n-randompage"><a href="...Special:Random">
Never mind mentioning $wgNamespaceRobotPolicies = array(NS_SPECIAL...
as it doesn't affect links TO those pages (independent bug I also
dutifully submitted today).
So do right away fix this with a kludge on whatever produces that
Don't wait for better overall solutions perhaps mentioned in my other
bugs today (the bug numbers of which I don't know as I write this
offline, to be sent to my batch posting script.)
Yes you might say "ho ho ho, Special:Random is only a tiny fraction of
the pages indexed at real wikis, vs. your puny wiki, Jacobson."
Yeah well don't call the doctor when your classmate clicks the link
for "Depression medicine" and gets the "Aggression medicine (Herbal)"
page, eats it, and cuts you to bits.
Ho hum, another day, another life saved with my erudite bug reports.
P.S., http://meta.wikimedia.org/wiki/Robots.txt says
The only way to keep a URL out of Google's index is to let Google
crawl the page and see a meta tag specifying robots="noindex".
Although this meta tag is already present on the edit page HTML
template, Google does not spider the edit pages (because they are
forbidden by robots.txt) and therefore does not see the meta tag.
But that paragraph need not hinder fixing this bug.
Confirmed for mozilla.org wiki:
Why don't you just prohibit Special:Random in your robots.txt?
>Why don't you just prohibit Special:Random in your robots.txt?
Good temporary workaround, but:
* One must learn about, create, and not screw up a robots.txt, that
will get lost anyway perhaps before long as it is not in the tar file.
* Even a Mr. WikiSysop doesn't necessarily control robots.txt (e.g., *.wikia.com)
* Each WikiSysop must manually fix a problem that upstream could fix with a
mere nofollow in the link.
rel="nofollow" does not mean, "don't follow this link"; rather, it means, "don't assign this page weight in ranking or related algorithms based on significance".
Anyway now I am using http://taizhongbus.jidanni.org/robots.txt where I do things like
#the above is a Google, etc. extension
And then I externally link
for search engines, which just transcludes Special:Allpages
as a hack to get around bug 8473 .