Last modified: 2014-10-16 12:10:19 UTC
Currently we have a blanket Disallow in bugzilla's robots.txt. This is a bit rude, as it makes it harder to track down MediaWiki bugs. It might be nice to allow at least plain bug views and let them get indexed.
Keywords: shell
Bulk-assigning open BZ issues to Fred.
Added the following: Allow: /show_bug.cgi to robots.txt. There is a slight chance that this might cause additional load, so I will be monitoring the webserver to make sure that there is no noticeable performance hit. I believe the change above should do the trick, but time will tell.
This isn't fixed. When I search for MediaWiki error messages in Google or Bing, I *never* get results from bugzilla.wikimedia.org. For example, search for notice Uncommitted DB writes transaction from DatabaseBase::query MessageBlobStore::clear All I get is useless osdir.com and gmane rehashes of mail threads involving bug 56269. The bug should be the first result. I'm pretty sure it's because in https://bugzilla.wikimedia.org/robots.txt, the line Disallow: /*.cgi blocks any .cgi URL, including https://bugzilla.wikimedia.org/show_bug.cgi?id=56269 . Even though Allow: /*show_bug.cgi comes later, "The first match found is used." The fix is to move the Allow line first, compare with Mozilla's https://bugzilla.mozilla.org/robots.txt.
Is this file [1] the one needing to be changed? i.e. is it the file used to generate https://bugzilla.mozilla.org/robots.txt, or is it just a random copy sitting in Git? [1] https://git.wikimedia.org/blob/wikimedia%2Fbugzilla%2Fmodifications.git/master/extensions%2FSitemap%2Frobots.txt
No that's not the file. /extensions/Sitemap will be killed soon (bug 33406). No idea if robots.txt is in Git. If it is, it's somewhere in operations/puppet/modules/bugzilla
It doesn't seem to be in that repo. Per "you still have to copy upstream bugzilla itself to the bugzilla path and clone our modifications from the wikimedia/bugzilla/modifcations repo" [1] I guess it is not in Git, similar to the rest of the BZ server-side files. This needs shell/ops then. [1] http://git.wikimedia.org/blob/operations%2Fpuppet.git/18f755cfecf9abdd23a0678e82f278188e059379/modules%2Fbugzilla%2FREADME.md
(In reply to spage from comment #4) > This isn't fixed. It worked for a while, it's a recent regression. No idea when the robots.txt block was reintroduced.
We could add this file to puppet but it's low priority.