Last modified: 2010-01-15 18:01:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9346, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 7346 - rss feed item should contain a guid element
rss feed item should contain a guid element
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
History/Diffs (Other open bugs)
1.16.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://bugs.debian.org/cgi-bin/bugrep...
: patch, patch-need-review
Depends on:
Blocks: feeds
  Show dependency treegraph
 
Reported: 2006-09-16 10:17 UTC by Romain Beauxis
Modified: 2010-01-15 18:01 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch to add <guid> element to RSS items. (643 bytes, patch)
2009-11-27 15:41 UTC, Tim Landscheidt
Details
add a <guid> element to RSS feeds (730 bytes, patch)
2009-11-27 19:28 UTC, Daniel Hazelton
Details
patch to add guid and permalink support to feeds (2.50 KB, patch)
2010-01-08 18:37 UTC, Jools Wills
Details
patch to add guid and permalink support to feeds (2.62 KB, patch)
2010-01-12 16:11 UTC, Jools Wills
Details
patch to add guid and permalink support to feeds (2.63 KB, patch)
2010-01-12 16:15 UTC, Jools Wills
Details
6951: patch to add guid and permalink support to feeds (2.24 KB, patch)
2010-01-15 17:44 UTC, Jools Wills
Details

Description Romain Beauxis 2006-09-16 10:17:14 UTC
Hi all!

I could not found this bug by searching in the DB, so I fill it there.. Hope it not only noise.
-------
Package: mediawiki1.7
Version: 1.7.1-1
Severity: normal

I noticed that when using the "recent changes" mediawiki RSS feed with
liferea, it keeps showing duplicate entries.

According to the liferea documentation[1], this appears to be a problem
with mediawiki[2].

It would be nice if this could be fixed.

References:

[1] file:///usr/share/liferea/doc/html/faq_en.html

"Q: Why do feed items keep being displayed as new?  A: This is usually
due to a bad feed which associated a particular ID to multiple items.
You should check your feed against a feed validator such as
feedvalidator.org. If the validator does not report any error, please
submit a bug report including the URL of the problem feed to the Liferea
bugtracker.

Note: If you experience this problem with a planet feed the reason might
be that the planet feed does not provide unique item ids for one or all
off its source feeds. If this is the case Liferea has no chance to match
identical items."

[2]
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fmeta.wikimedia.org%2Fw%2Findex.php%3Ftitle%3DSpecial%3ANewpages%26feed%3Drss

"line 67, column 203: item should contain a guid element (50 occurrences)"

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.8-3-k7
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Comment 1 Brion Vibber 2006-09-18 08:08:29 UTC
The spec explicitly allows this. Please contact liferea authors and inform them that this is legit. :)
Comment 2 Daniel Hazelton 2009-05-30 20:41:48 UTC
Resolving this bug as "invalid" is not correct. Yes, the <guid> tag is not required, but then, only the <title>, <link> and <description> tags are listed as required by the specification. 

"This “Global Unique Identifier” allows you to republish or update specific
items without duplicating these items in an aggregator. If you change an
item without using the <guid> element, then the aggregator has no way of
determining that the new item is replacing an old item. In that case, the
aggregator will retain the old item and the new item, forcing the user to read
it twice. If the <guid> element exists (and is the same as a previous item’s
<guid>) then the aggregator can (at the users option) replace the old item
with the new one. If the user has not read the item yet, then all they will see
is the updated item. If they have read the old item already, then they can
optionally read the update or ignore it." -- http://www.feederreader.com/TechnicalGuides/RSS_Basic.html

That is referenced from the current spec, which is apparently housed here: http://cyber.law.harvard.edu/rss/rss.html

In other words... Without <guid> an "aggregator" has no way to determine whether an individual <item> is new or if it has been seen before. So while not required by the spec, not including it actually is a bug. No major feed that I have been able to find - and that claims to be RSS 2 - omits the <guid> tag.
Comment 3 Dan Jacobson 2009-06-01 00:26:17 UTC
And I thought it somehow had to do with how I manage my wiki.
All those bytes (bug 17058) and no guid!
Comment 4 Dan Jacobson 2009-06-02 07:58:26 UTC
If not implementing for Special:RecentChanges&feed=rss, then at least
implement for Special:NewPages&feed=rss, whose URLs are more robust.
Comment 5 Tim Landscheidt 2009-11-27 15:41:20 UTC
Created attachment 6828 [details]
Patch to add <guid> element to RSS items.

Small patch to solve issue. I tested the result with <URI:http://www.feedvalidator.org/>. Note that for debugging I had to clear objectcache, otherwise the output remained the same :-).
Comment 6 Daniel Hazelton 2009-11-27 19:28:11 UTC
Created attachment 6829 [details]
add a <guid> element to RSS feeds

Created this when I noticed that just using the URL still had duplicate items appearing. With the time of the edit added to the URL and the guid flagged as not being a permalink this works extremely well. A much better way to manage things would be to provide a perma-link URL for the feed code - but failing that this should work well.
Comment 7 Tim Landscheidt 2009-11-27 22:17:55 UTC
I don't quite unterstand that. First, adding the time of the edit should make no difference at all, since it is redundant to the revision ID already contained in the URL. Second, by my reading of the specification, isPermaLink (with a guid of solely the URL) should be true (or omitted) as the diff link is unique and stable.

  In any case, in RSSFeed::outItem $item->getDate() does not seem to be guaranteed to exist, so if it is to be added to the guid, it should be checked for properly.
Comment 8 Daniel Hazelton 2009-11-28 00:45:52 UTC
Neither did I when it started happening. But apparently the URL is constructed with two ID's so the unique diff can be pulled up. If there is a change made after that, then the link was changing - however, I didn't try it under a newer version of the code-base.

And I didn't know that ->getDate() wasn't guaranteed to exist - since it seems to always exist in my install.

In any case... As I said it would be better to use a link to that specific revision without setting it as a diff as the guid. Because then it is guaranteed to not change. And it's just hit me that relying on the specific date is stupid, so I withdraw my proposed patch. If the URL is changing (or was - I've since updated to a much more recent code-base) then the duplication would be seen regardless. (And I have been seeing it)

So... I'm going to work on a more in-depth fix that will change the GUID to a URL that is that specific revision without the diff contents. I should have a patch for that by Monday.
Comment 9 Tim Landscheidt 2009-11-28 02:03:33 UTC
I don't see how the guid pointing to the revision itself would be an improvement. Keep in mind that the purpose of the feed is to point to the changes, i. e. the diffs, not to the revisions. If a guid would point to the revision, it could collide with other feeds that for example list new pages.

  Regarding RSSFeed::getDate(), I'm just deducing from the if-clause three lines above that it is not guaranteed to exist. Maybe someone likes to overhaul the entire process as many functions and structures (abstract base class with rather rigid structure, selectors that silently escape to XML, etc.) look very hackish.
Comment 10 Tomasz W. Kozlowski 2010-01-06 15:39:25 UTC
The RSS feeds for article history are also affected by this bug (at least in Liferea). 
I am subscribed to few RSS feeds for article history and am sorry to say that it doesn't work that well.
Here is a screenshot which clearly shows duplicated entries (please note that some of them are not duplicated!): 
http://img403.imageshack.us/img403/518/zrzutekranuliferea.png

The Feedvalidator shows the same: "item should contain a guid element": 
http://feedvalidator.org/check.cgi?url=http://en.wikipedia.org/w/index.php?title=1Q84&feed=rss&action=history

I hope that would be fixed sometime in the future as this is pretty annoying...
Thanks, 
Tomasz
Comment 11 Jools Wills 2010-01-08 18:37:06 UTC
Created attachment 6939 [details]
patch to add guid and permalink support to feeds

Despite it technically being ok not to have a guid, it is an annoyance. And a quote from the RSS Spec says

"In all cases, it's recommended that you provide the guid, and if possible make it a permalink. This enables aggregators to not repeat items, even if there have been editing changes."

And infact this is the main problem I am having in that I have an extension that uses the RSS feed system, and if i make a change to an item, it will be repeated as the RSS software can not tell that it is an old item.

I have made a patch, that not only allows adding of a guid, allows you to set it as a permalink for RSS (this is not needed for atom). it also makes the atom use the new guid, which by default is set to the url, but can be changed with a setuniqueid call on the item. Please can we get this sorted as soon as possible!
Comment 12 Jools Wills 2010-01-12 16:11:52 UTC
Created attachment 6950 [details]
patch to add guid and permalink support to feeds

Feed.php hadnt been updated in over a year. I maek my patch and then someone cleans the file up! here is a new patch that applies to latest svn.

Perhaps someone can have a look at this before Feed.php is again changed?

patch includes a couple of minor cleanups/fixes to the file also.
Comment 13 Jools Wills 2010-01-12 16:15:14 UTC
Created attachment 6951 [details]
patch to add guid and permalink support to feeds

oops. couple of indentation mistakes in that one, and i made the ordering more logical.
Comment 14 Jools Wills 2010-01-15 17:44:18 UTC
Created attachment 6959 [details]
6951: patch to add guid and permalink support to feeds

change some parameter names in new setUniqueId function
removed cosmetic changes from the patch to make it more readable.
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-01-15 18:01:20 UTC
Last patch committed as r61090 after discussion in #mediawiki.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links