Last modified: 2014-11-04 22:48:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15307, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 13307 - Robots.txt exempt for browsershots


Summary:	Robots.txt exempt for browsershots

Status:	RESOLVED WONTFIX

Product:	Wikimedia
Classification:	Unclassified
Component:	Site requests (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/robots.txt
Whiteboard:
Keywords:	shell

Duplicates:	27986 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-03-10 04:39 UTC by slakr
Modified:	2014-11-04 22:48 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description slakr 2008-03-10 04:39:49 UTC

Hiya,

Whenever someone gets a chance, pretty please add an exempt for browsershots to robots.txt on enwiki (based on http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_18#robots.txt_.2F_browsershots as well as http://en.wikipedia.org/wiki/User_talk:Slakr#Browsershots)

E.g.,:

User-agent: Browsershots
Disallow:

... or something similar.  That would help tremendously in examining cross-compatibility between browsers and platforms.


Thanks a million, and cheers =)
--slakr@enwiki

Comment 1 Brion Vibber 2008-03-10 19:03:12 UTC

There shouldn't be anything blocking Browsershots from accessing pages; certainly nothing apparent in our robots.txt

From my previous testing, it appears that Browsershots itself is very badly configured:

1) It first loads /robots.txt with a generic "Python-urllib" user-agent. This is blocked at the HTTP proxy level on Wikimedia sites due to past abuse, so they would be unable to load our robots.txt file.

2) It then loads the requested page with an app-specific "Browsershots" user-agent. This is not blocked, and would be allowed with no problems.

3) It then passes the page off to a bunch of browsers in turn.

I can only assume that the failure to read robots.txt is interpreted as a bad sign. :)

Please contact Browsershots and tell them that their bot is broken; they should load robots.txt with the same user-agent string that they load the page itself with.

Comment 2 JeLuF 2008-04-16 05:49:16 UTC

Closed.

Comment 3 MZMcBride 2011-03-11 00:57:52 UTC

*** Bug 27986 has been marked as a duplicate of this bug. ***

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links