Last modified: 2011-03-13 18:06:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18007, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16007 - RSS/Atom feeds prohibited by robots.txt
RSS/Atom feeds prohibited by robots.txt
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Main_Page
: shell
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-16 19:03 UTC by Aaron Swartz
Modified: 2011-03-13 18:06 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Aaron Swartz 2008-10-16 19:03:12 UTC
http://en.wikipedia.org/wiki/Main_Page lists its RSS feeds as:

    <link rel="alternate" type="application/rss+xml" title="Wikipedia RSS Feed" href="http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&amp;feed=rss" />
    <link rel="alternate" type="application/atom+xml" title="Wikipedia Atom Feed" href="http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&amp;feed=atom" />

Both of these are in the /w/ directory which http://en.wikipedia.org/robots.txt prohibits to the default robot. This means that clients which obey robots.txt can't read Wikipedia's RSS feed.

http://en.wikipedia.org/wiki/Special:RecentChanges?feed=atom is presumably permitted, but that's not linked to.
Comment 1 Brion Vibber 2008-10-20 19:47:02 UTC
My understanding is that feed readers should be acting as user-agents, not robots, so this _ought_ not to be a problem unless you want eg search engines to index the feed contents. (Which we probably don't.)

Can you turn up examples of feed readers that are using robots.txt prohibitions which are affected by this?
Comment 2 Aaron Swartz 2008-10-20 20:08:52 UTC
I ran into this because the Python feedfinder library follows robots.txt and thus won't find the Wikipedia RSS feed.

Yahoo does the same thing: http://jeremy.zawodny.com/blog/archives/001474.html
Comment 3 Brion Vibber 2009-05-28 19:07:12 UTC
I'm not convinced the feeds are really search-friendly... each page has a history feed, and the whole site has a number of feeds (RC and otherwise) which tend to change very quickly. In addition, the generation of diffs etc for the feeds may result in nastiness on a spider crawl visit. I'm marking this WONTFIX for now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links