Last modified: 2011-03-13 18:06:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18007, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 16007 - RSS/Atom feeds prohibited by robots.txt


Summary:	RSS/Atom feeds prohibited by robots.txt

Status:	RESOLVED WONTFIX

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/wiki/Main_Page
Whiteboard:
Keywords:	shell

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-10-16 19:03 UTC by Aaron Swartz
Modified:	2011-03-13 18:06 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Aaron Swartz 2008-10-16 19:03:12 UTC

http://en.wikipedia.org/wiki/Main_Page lists its RSS feeds as:

    <link rel="alternate" type="application/rss+xml" title="Wikipedia RSS Feed" href="http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&amp;feed=rss" />
    <link rel="alternate" type="application/atom+xml" title="Wikipedia Atom Feed" href="http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&amp;feed=atom" />

Both of these are in the /w/ directory which http://en.wikipedia.org/robots.txt prohibits to the default robot. This means that clients which obey robots.txt can't read Wikipedia's RSS feed.

http://en.wikipedia.org/wiki/Special:RecentChanges?feed=atom is presumably permitted, but that's not linked to.

Comment 1 Brion Vibber 2008-10-20 19:47:02 UTC

My understanding is that feed readers should be acting as user-agents, not robots, so this _ought_ not to be a problem unless you want eg search engines to index the feed contents. (Which we probably don't.)

Can you turn up examples of feed readers that are using robots.txt prohibitions which are affected by this?

Comment 2 Aaron Swartz 2008-10-20 20:08:52 UTC

I ran into this because the Python feedfinder library follows robots.txt and thus won't find the Wikipedia RSS feed.

Yahoo does the same thing: http://jeremy.zawodny.com/blog/archives/001474.html

Comment 3 Brion Vibber 2009-05-28 19:07:12 UTC

I'm not convinced the feeds are really search-friendly... each page has a history feed, and the whole site has a number of feeds (RC and otherwise) which tend to change very quickly. In addition, the generation of diffs etc for the feeds may result in nastiness on a spider crawl visit. I'm marking this WONTFIX for now.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links