Last modified: 2014-10-16 12:10:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15881, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13881 - Open Bugzilla to search spiders
Open Bugzilla to search spiders
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Bugzilla (Other open bugs)
unspecified
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
https://bugzilla.wikimedia.org/robots...
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-29 21:46 UTC by Brion Vibber
Modified: 2014-10-16 12:10 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2008-04-29 21:46:21 UTC
Currently we have a blanket Disallow in bugzilla's robots.txt. This is a bit rude, as it makes it harder to track down MediaWiki bugs.

It might be nice to allow at least plain bug views and let them get indexed.
Comment 1 Siebrand Mazeland 2008-08-18 13:53:37 UTC
Keywords: shell
Comment 2 Brion Vibber 2009-08-12 23:44:53 UTC
Bulk-assigning open BZ issues to Fred.
Comment 3 Fred Vassard 2009-09-10 18:50:22 UTC
Added the following:
Allow: /show_bug.cgi 
to robots.txt. 

There is a slight chance that this might cause additional load, so I will be monitoring the webserver to make sure that there is no noticeable performance hit.

I believe the change above should do the trick, but time will tell.
Comment 4 spage 2014-02-03 06:13:26 UTC
This isn't fixed.  When I search for MediaWiki error messages in Google or Bing, I *never* get results from bugzilla.wikimedia.org.  For example, search for
  notice Uncommitted DB writes transaction from DatabaseBase::query MessageBlobStore::clear
All I get is useless osdir.com and gmane rehashes of mail threads involving bug 56269. The bug should be the first result.

I'm pretty sure it's because in https://bugzilla.wikimedia.org/robots.txt, the line 
  Disallow: /*.cgi
blocks any .cgi URL, including https://bugzilla.wikimedia.org/show_bug.cgi?id=56269 .  Even though
  Allow: /*show_bug.cgi
comes later, "The first match found is used." The fix is to move the Allow line first, compare with Mozilla's https://bugzilla.mozilla.org/robots.txt.
Comment 5 This, that and the other (TTO) 2014-02-07 09:20:38 UTC
Is this file [1] the one needing to be changed? i.e. is it the file used to generate https://bugzilla.mozilla.org/robots.txt, or is it just a random copy sitting in Git?

[1] https://git.wikimedia.org/blob/wikimedia%2Fbugzilla%2Fmodifications.git/master/extensions%2FSitemap%2Frobots.txt
Comment 6 Andre Klapper 2014-02-07 10:41:08 UTC
No that's not the file. /extensions/Sitemap will be killed soon (bug 33406).
No idea if robots.txt is in Git. If it is, it's somewhere in operations/puppet/modules/bugzilla
Comment 7 This, that and the other (TTO) 2014-02-07 11:31:24 UTC
It doesn't seem to be in that repo. Per "you still have to copy upstream bugzilla itself to the bugzilla path and clone our modifications from the wikimedia/bugzilla/modifcations repo" [1] I guess it is not in Git, similar to the rest of the BZ server-side files.

This needs shell/ops then.

[1] http://git.wikimedia.org/blob/operations%2Fpuppet.git/18f755cfecf9abdd23a0678e82f278188e059379/modules%2Fbugzilla%2FREADME.md
Comment 8 Nemo 2014-04-03 13:32:31 UTC
(In reply to spage from comment #4)
> This isn't fixed.

It worked for a while, it's a recent regression. No idea when the robots.txt block was reintroduced.
Comment 9 Andre Klapper 2014-05-22 17:35:03 UTC
We could add this file to puppet but it's low priority.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links