Last modified: 2014-05-23 01:16:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T25686, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 23686 - Ids in user contributions Atom feed are not unique
Ids in user contributions Atom feed are not unique
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.16.x
All All
: Low normal with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/Wikipedi...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-28 01:42 UTC by Svick
Modified: 2014-05-23 01:16 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Svick 2010-05-28 01:42:06 UTC
According to the Atom specification (http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.id), entries with the same id should represent the same entry. And because an entry in user contributions feed (e.g. http://en.wikipedia.org/w/index.php?title=Special%3AContributions/Svick&feed=atom&limit=50&target=Svick&year=&month=) represents an edit, each edit should have its own id, but currently, the id is the URL of the changed page.

I think this causes repeated showing of the same edit in Google Reader.
Comment 1 KATO Takayuki 2010-10-30 13:44:56 UTC
 id(In reply to comment #0)
> According to the Atom specification
> (http://www.atomenabled.org/developers/syndication/atom-format-spec.php#element.id),
> entries with the same id should represent the same entry. And because an entry
> in user contributions feed (e.g.
> http://en.wikipedia.org/w/index.php?title=Special%3AContributions/Svick&feed=atom&limit=50&target=Svick&year=&month=)
> represents an edit, each edit should have its own id, but currently, the id is
> the URL of the changed page.
> 
> I think this causes repeated showing of the same edit in Google Reader.

I validate contributions ATOM feed with http://feedvalidator.org/check.cgi .
validator says:

: column 81: Two entries with the same id

and I change url "feed=atom" to "feed=rss", validator says 

: column 84: guid values must not be duplicated within a feed http://.....
: column 1: Missing atom:link with rel="self"

so I think feed function has id check bug. and rss feed function has not correctly template.
Comment 2 KATO Takayuki 2010-10-30 13:57:33 UTC
Now, ATOM feed's id was made from only Article name.
so, id (or rssfeed's guid) overlaps occurred.

I think that id generator use mix of article name and edition number, this bug will fix.
Comment 3 nornand 2010-11-13 20:17:11 UTC
It's exactly as KATO Takayuki says.

Currently, feeds are built using for <guid> field in RSS and for <id> in Atom something like "http://xx.wikipedia.org/wiki/article_name". The solution doesn't seem too complicated and would consist in using instead of "http://xx.wikipedia.org/wiki/article_name" something like "http://xx.wikipedia.org/w/index.php?title=article_name&oldid=xxxxxxxx". This way, we would make sure that every entry has an unique identifier.
But this change should be implemented as soon as possible because this issue is already causing trouble. Check this:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#To_Google_Reader_users:_You_may_be_missing_items_from_your_watchlist_feed.

and this:

http://www.google.com/support/forum/p/reader/thread?tid=4e28dcb545efabb3&hl=en
Comment 5 Brion Vibber 2010-12-27 04:09:39 UTC
I think this is pretty much the same issue as the old bug 3998; technically what we're doing is valid (considering each page as an item, and we're including multiple versions of them) but it is indeed probably not matching up well with what receiving entities will be expecting.

Probably best to change the feeds to go ahead and use ids that are specific to the revision and the way it's being displayed, ensuring that feed-processing systems do keep them separate in their caches.

(My old arguments on bug 3998 are in the other direction, but I'm pretty convinced now that I was wrong in 2005. ;)
Comment 6 nornand 2011-01-03 00:18:52 UTC
About what is being done is valid or not, all I can say is that my RSS Watchlist feed doesn't pass W3C validation...
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dfeedwatchlist%26allrev%3Dallrev%26wlowner%3DCanyq%26wltoken%3D080630b3f4931ff5964fa7e69e6ee5a19871d1dc%26feedformat%3Drss
or feed validator test
http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dfeedwatchlist%26allrev%3Dallrev%26wlowner%3DCanyq%26wltoken%3D080630b3f4931ff5964fa7e69e6ee5a19871d1dc%26feedformat%3Drss
My Atom Watchlist feed passes both tests but with recommendations related to this not unique id issue:
http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dfeedwatchlist%26allrev%3Dallrev%26wlowner%3DCanyq%26wltoken%3D080630b3f4931ff5964fa7e69e6ee5a19871d1dc%26feedformat%3Datom
http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dfeedwatchlist%26allrev%3Dallrev%26wlowner%3DCanyq%26wltoken%3D080630b3f4931ff5964fa7e69e6ee5a19871d1dc%26feedformat%3Datom
Finally, it must be remembered that, as I have proved, such a popular feed reader like Google Reader misses items from Wikipedia Watchlists very often due to this problem. In fact, I was using it to follow changes in Wikipedia articles I track but, as many of these articles are being controlled against vandalism, I can't accept these losses. Therefore, while this issue is fixed, I am following my watchlist manually, ignoring feeds.
Obviously, I don't know how may people use Google Reader to control their Watchlists but for me, this is a serious problem with (I think) an easy solution.
Comment 7 nornand 2014-05-23 01:16:36 UTC
In the last days, I've realized that there's been a change after which entries use now an unique identifier following the pattern:

//es.wikipedia.org/w/index.php?title=[Article_title]&amp;diff=[Edition_id]

where [Article_title] is, of course, article title, and [Edition_id] is the edition number, which as far as I know, is an unique identifier. Therefore, the issue described in this page should be no longer a problem.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links