Last modified: 2014-11-04 22:48:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15307, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13307 - Robots.txt exempt for browsershots
Robots.txt exempt for browsershots
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Lowest enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/robots.txt
: shell
: 27986 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-10 04:39 UTC by slakr
Modified: 2014-11-04 22:48 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description slakr 2008-03-10 04:39:49 UTC
Hiya,

Whenever someone gets a chance, pretty please add an exempt for browsershots to robots.txt on enwiki (based on http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_18#robots.txt_.2F_browsershots as well as http://en.wikipedia.org/wiki/User_talk:Slakr#Browsershots)

E.g.,:

User-agent: Browsershots
Disallow:

... or something similar.  That would help tremendously in examining cross-compatibility between browsers and platforms.


Thanks a million, and cheers =)
--slakr@enwiki
Comment 1 Brion Vibber 2008-03-10 19:03:12 UTC
There shouldn't be anything blocking Browsershots from accessing pages; certainly nothing apparent in our robots.txt

From my previous testing, it appears that Browsershots itself is very badly configured:

1) It first loads /robots.txt with a generic "Python-urllib" user-agent. This is blocked at the HTTP proxy level on Wikimedia sites due to past abuse, so they would be unable to load our robots.txt file.

2) It then loads the requested page with an app-specific "Browsershots" user-agent. This is not blocked, and would be allowed with no problems.

3) It then passes the page off to a bunch of browsers in turn.

I can only assume that the failure to read robots.txt is interpreted as a bad sign. :)

Please contact Browsershots and tell them that their bot is broken; they should load robots.txt with the same user-agent string that they load the page itself with.
Comment 2 JeLuF 2008-04-16 05:49:16 UTC
Closed.
Comment 3 MZMcBride 2011-03-11 00:57:52 UTC
*** Bug 27986 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links