Last modified: 2014-11-04 22:48:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13307 - Robots.txt exempt for browsershots
Robots.txt exempt for browsershots
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Lowest enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/robots.txt
: shell
: 27986 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-10 04:39 UTC by slakr
Modified: 2014-11-04 22:48 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description slakr 2008-03-10 04:39:49 UTC
Hiya,

Whenever someone gets a chance, pretty please add an exempt for browsershots to robots.txt on enwiki (based on http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_18#robots.txt_.2F_browsershots as well as http://en.wikipedia.org/wiki/User_talk:Slakr#Browsershots)

E.g.,:

User-agent: Browsershots
Disallow:

... or something similar.  That would help tremendously in examining cross-compatibility between browsers and platforms.


Thanks a million, and cheers =)
--slakr@enwiki
Comment 1 Brion Vibber 2008-03-10 19:03:12 UTC
There shouldn't be anything blocking Browsershots from accessing pages; certainly nothing apparent in our robots.txt

From my previous testing, it appears that Browsershots itself is very badly configured:

1) It first loads /robots.txt with a generic "Python-urllib" user-agent. This is blocked at the HTTP proxy level on Wikimedia sites due to past abuse, so they would be unable to load our robots.txt file.

2) It then loads the requested page with an app-specific "Browsershots" user-agent. This is not blocked, and would be allowed with no problems.

3) It then passes the page off to a bunch of browsers in turn.

I can only assume that the failure to read robots.txt is interpreted as a bad sign. :)

Please contact Browsershots and tell them that their bot is broken; they should load robots.txt with the same user-agent string that they load the page itself with.
Comment 2 JeLuF 2008-04-16 05:49:16 UTC
Closed.
Comment 3 MZMcBride 2011-03-11 00:57:52 UTC
*** Bug 27986 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links