Last modified: 2009-05-20 13:14:35 UTC
I can't believe Google is showing Special:Random as the second highest match for a phrase at my wiki. You perhaps will argue "Jacobson, check your marbles, we send HTTP/1.1 302 Found and Location: ..." Well there was my page, in their index but with ...title=Special:Random as that page's URL. That's all I know. The problem is in the navigation on each and every page, at <li id="n-randompage"><a href="...Special:Random"> Never mind mentioning $wgNamespaceRobotPolicies = array(NS_SPECIAL... as it doesn't affect links TO those pages (independent bug I also dutifully submitted today). So do right away fix this with a kludge on whatever produces that navigation panel. Don't wait for better overall solutions perhaps mentioned in my other bugs today (the bug numbers of which I don't know as I write this offline, to be sent to my batch posting script.) Yes you might say "ho ho ho, Special:Random is only a tiny fraction of the pages indexed at real wikis, vs. your puny wiki, Jacobson." Yeah well don't call the doctor when your classmate clicks the link for "Depression medicine" and gets the "Aggression medicine (Herbal)" page, eats it, and cuts you to bits. Ho hum, another day, another life saved with my erudite bug reports. P.S., http://meta.wikimedia.org/wiki/Robots.txt says The only way to keep a URL out of Google's index is to let Google crawl the page and see a meta tag specifying robots="noindex". Although this meta tag is already present on the edit page HTML template, Google does not spider the edit pages (because they are forbidden by robots.txt) and therefore does not see the meta tag. But that paragraph need not hinder fixing this bug.
Confirmed for mozilla.org wiki: http://www.google.com/search?q=link%3Ahttp%3A%2F%2Fwiki.mozilla.org%2FSpecial%3ARandom
Why don't you just prohibit Special:Random in your robots.txt?
>Why don't you just prohibit Special:Random in your robots.txt? Good temporary workaround, but: * One must learn about, create, and not screw up a robots.txt, that will get lost anyway perhaps before long as it is not in the tar file. * Even a Mr. WikiSysop doesn't necessarily control robots.txt (e.g., *.wikia.com) But mainly: * Each WikiSysop must manually fix a problem that upstream could fix with a mere nofollow in the link.
rel="nofollow" does not mean, "don't follow this link"; rather, it means, "don't assign this page weight in ranking or related algorithms based on significance".
Anyway now I am using http://taizhongbus.jidanni.org/robots.txt where I do things like Disallow: /index.php?title=Special: Disallow: /index.php?diff= Disallow: /index.php?oldid= Disallow: /index.php?title=Template: Disallow: /skins Disallow: /*& #the above is a Google, etc. extension And then I externally link http://taizhongbus.jidanni.org/index.php?title=Template_talk:Robots_temp for search engines, which just transcludes Special:Allpages as a hack to get around bug 8473 .