Last modified: 2013-09-29 13:42:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T36778, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 34778 - Deploy extension Memento on Wikipedia sites
Deploy extension Memento on Wikipedia sites
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Extension setup (Other open bugs)
unspecified
All All
: Lowest enhancement with 18 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 31235
  Show dependency treegraph
 
Reported: 2012-02-28 20:26 UTC by Rob Sanderson
Modified: 2013-09-29 13:42 UTC (History)
28 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rob Sanderson 2012-02-28 20:26:35 UTC
MediaWiki currently does not support access to prior states of its articles using the Memento time-based content negotiation paradigm [1,2,3].  Memento allows browsers to request resources at a particular point in time, and for TimeGates to then redirect the browser to the version that was accessible at the requested time.  As MediaWiki maintains all versions, it is the most authentic and knowledgeable source of this information, compared to (for example) the Internet Archive's very sparse collection of articles.  MediaWiki is the best placed to efficiently and accurately provide this information, rather than a third party system.

Users wish to see versions of resources both before and after certain events, for example one might wish to see the page about Michael Jackson both before and after his death, or follow the evolution of the description of the TSA's approach to air travel security since 2001.

Editors can also benefit from Memento access to see where the hot spots of activity are, and the differences before and after editing wars.

By exposing Memento TimeGates, MediaWiki allows for time series analysis of its resources, either by extracting information from the article (text mining, data extraction, etc) or from the upcoming data platform. As the information may change many times, allowing fine grained access is extremely valuable compared to the DBPedia implementation[4]. 

A prototype extension which implements Memento for MediaWiki is available: 
http://www.mediawiki.org/wiki/Extension:Memento  A working browser plugin for Firefox is also available: http://bit.ly/memfox


Resources:

1 https://datatracker.ietf.org/doc/draft-vandesompel-memento/
2 http://www.mementoweb.org/
3 http://arxiv.org/abs/0911.1112
4 http://arxiv.org/abs/1003.3661
5 http://www.mediawiki.org/wiki/Extension:Memento
6 http://bit.ly/memfox
Comment 1 Rob Sanderson 2012-02-28 20:47:50 UTC
Clarifications:

* Memento is valuable for regular users and content editors, as well as bots, cyborgs and other software agents.  Giving HTTP access to versions allows easy access to software agents to perform the kinds of analysis human users and editors do.  Software agents are the primary audience of the third use case given, time series analysis of the information in the articles.

* The extension referenced has been tested in several installs around the world, but not in a Wikipedia scale installation of MediaWiki.  As such we describe it as a prototype, but it must be noted that it is stable and tested.
Comment 2 Rob Sanderson 2012-02-28 21:10:41 UTC
Additional Use Cases:

* Users find it time-consuming to navigate the many history pages to find the
article at a particular point in time.  Memento makes this a single click,
saving the user time and frustration.

* Users wish to navigate between old versions of articles at a particular point
in time.  As the browser sends the requested timestamp, the user can click on
internal links within the wiki and be taken straight to the version of the
article at that time.  Without Memento, the user will be taken to the current
version and must then page through the history to find the appropriate version.
 This integrates with external web archives to display resources linked to from
articles that are outside of the control of MediaWiki.
Comment 3 Dario Taraborelli 2012-02-28 21:44:50 UTC
As this is a request to deploy the Memento extension in production on Wikipedia, I file it under the correct product category. Not sure if History/Diff is the most relevant component.
Comment 4 Mark A. Hershberger 2012-02-29 16:52:03 UTC
Extensions that are deployed need to be reviewed.  I've added it to [[mw:Review_queue]].
Comment 5 Sam Reed (reedy) 2012-02-29 19:57:27 UTC
Code needs to also go into our SVN repository
Comment 6 Bawolff (Brian Wolff) 2012-04-04 14:34:35 UTC
This extension currently involves patches to core mediawiki, which is generally a big no no.

It also doesn't follow some of Wikimedia's coding conventions. For example:

*Using $_SERVER directly, instead of looking at $wgServer variable, etc. Another example:
if ( !stripos( $_SERVER['REQUEST_URI'], '?' ) && !stripos( $_SERVER['REQUEST_URI'], 'Special:TimeGate' ) ) {

*Using $dbr->query instead of $dbr->select

*Hard coded strings that should be i18n-ized

*Things like: $page_namespace_id = constant( "NS_" . strtoupper($namespace) ); which make assumptions that may not be true in many setups

*Using echo in places where $wgOut->disable() has not been called.
Comment 7 Harihar Shankar 2012-04-04 15:01:42 UTC
Thanks for your comments. I will fix the plugin to incorporate these suggestions.

Clarification on patching MediWiki Core: 
The patch to the core of mediawiki is needed only when older versions of Templates are to be fetched by the plugin. This patch is not mandatory for this plugin to work. 


(In reply to comment #6)
> This extension currently involves patches to core mediawiki, which is generally
> a big no no.
> 
> It also doesn't follow some of Wikimedia's coding conventions. For example:
> 
> *Using $_SERVER directly, instead of looking at $wgServer variable, etc.
> Another example:
> if ( !stripos( $_SERVER['REQUEST_URI'], '?' ) && !stripos(
> $_SERVER['REQUEST_URI'], 'Special:TimeGate' ) ) {
> 
> *Using $dbr->query instead of $dbr->select
> 
> *Hard coded strings that should be i18n-ized
> 
> *Things like: $page_namespace_id = constant( "NS_" . strtoupper($namespace) );
> which make assumptions that may not be true in many setups
> 
> *Using echo in places where $wgOut->disable() has not been called.
Comment 8 Bawolff (Brian Wolff) 2012-04-04 18:39:28 UTC
>The patch to the core of mediawiki is needed only when older versions of
>Templates are to be fetched by the plugin. This patch is not mandatory for this
>plugin to work.

My mistake. To be honest I just skimmed the extension page rather quickly (The perennial extensions never get looked at thread came back up on the mailing list, and I was curious about which extensions were in the review queue).
Comment 9 Harihar Shankar 2012-04-23 15:35:58 UTC
We have worked on the extension code so that it meets the MediaWiki coding conventions. The newer version of the code can be downloaded from http://www.mediawiki.org/wiki/Extension:Memento . We have also updated the extension documentation page to reflect the new changes.
Comment 10 Platonides 2012-04-23 16:01:25 UTC
Vulnerable to register_globals

$mmScriptPath defined but not used.
Useless statement $historyuri;
No need of mmSetupExtension() for setting a hook.
Usage of $wgTitle will fail on recent MediaWiki
stripos() is not the way to check if a variable was set in the query string
explode() is not how you retrieve a variable from the query string
You're changing the default timezone, overriding whatever the user might have configured.
HTML injection building links
Hardcoded names of Special pages
You're fetching the whole list of revisions for each page, that can be a very expensive operation, retrieving several thousands of rows. Try requesting just what you need.

This is not suitable for deployment at this point. I recommend you to reach some developers on how to properly code this.
Comment 11 Rob Sanderson 2012-04-23 16:58:20 UTC
Dear Platonides,

Thank you for the comments on the latest revision.

Could you please provide pointers to the best practices regarding retrieving the title, and how best to parse URLs without stripos() and explode()?


Regarding the timezone, we're not changing it other than to follow the RFC specification that all timestamps in HTTP headers MUST be in GMT, "without exception". This does not change the UX in the page. Please see:
http://tools.ietf.org/html/rfc2616#page-20

Could you please confirm what you mean by "HTML injection building links"? We do not change the HTML of the returned history page.

If there is a better way to discover the names of the Special pages (timegate and timemap) generated within the extension, please let us know and we'll update the extension.

We do request only the parts of the history list that are required for the different operations.  The timegate needs the closest match, first, last, previous and next.  The timemap is a serialization of the set of versions of the resource, and thus requires the entire history list.

Many thanks!
Comment 12 Max Semenik 2012-04-23 17:22:48 UTC
(In reply to comment #11)
> Could you please provide pointers to the best practices regarding retrieving
> the title, and how best to parse URLs without stripos() and explode()?

RequestContext.

> Regarding the timezone, we're not changing it other than to follow the RFC
> specification that all timestamps in HTTP headers MUST be in GMT, "without
> exception". This does not change the UX in the page. Please see:
> http://tools.ietf.org/html/rfc2616#page-20

Then prepare your headers in a way that doesn't affect UI timezone.

> If there is a better way to discover the names of the Special pages (timegate
> and timemap) generated within the extension, please let us know and we'll
> update the extension.

SpecialPage::getTitle(), getTitleFor()

> We do request only the parts of the history list that are required for the
> different operations.  The timegate needs the closest match, first, last,
> previous and next.  The timemap is a serialization of the set of versions of
> the resource, and thus requires the entire history list.

Then it's not going to be deployed on WMF, because requesting thousands of revisions is not an option with our scale.

By the way, is there evidence of interest in this feature from Wikimedia community?
Comment 13 Max Semenik 2012-04-23 17:43:35 UTC
Another issue:
> $xares = $dbr->select( "revision", array('rev_id', 'rev_timestamp'), array("rev_page=$pg_id"), __METHOD__, array("ORDER BY"=>"rev_id DESC") );

rev_id is not guaranteed to always behave like you want it to, sort by rev_timestamp.

(Expanding on comment #12)

> Then it's not going to be deployed on WMF, because requesting thousands of
> revisions is not an option with our scale.

Even caching isn't going to help, because we have pages with *hundreds of thousands* revisions. Memcached object size limit is 1MB - revision information for such pages won't even fit into it. And Special:Timemap will get tired of serving megabytes and megabytes of data in such cases.
Comment 14 Platonides 2012-04-23 18:00:19 UTC
(In reply to comment #11)
> Dear Platonides,
> 
> Thank you for the comments on the latest revision.
> 
> Could you please provide pointers to the best practices regarding retrieving
> the title, and how best to parse URLs without stripos() and explode()?

The request object has a getVal() method.


> Regarding the timezone, we're not changing it other than to follow the RFC
> specification that all timestamps in HTTP headers MUST be in GMT, "without
> exception". This does not change the UX in the page. Please see:
> http://tools.ietf.org/html/rfc2616#page-20

Of course, so you either change the default and set it back to what it was before or -better- use a function that doesn't need switching default timezones.
I think your mmConvertTimestamp() function could be replaced with a call to wfTimestamp() with TS_RFC2822 output.


> Could you please confirm what you mean by "HTML injection building links"? We
> do not change the HTML of the returned history page.
You're handcrafting many urls, such as 
 $first['uri'] = $alturi . "?title=" . $title . "&oldid=" . $oldestRevID;

This is horrible practise. It'd lead to html injection if outputted in html, in http headers the server might be tricked to redirect to an attacker website (maybe not possible with the broken way you read them, but stil...).
Look to wfExpandUrl() and wfAppendQuery()


> If there is a better way to discover the names of the Special pages (timegate
> and timemap) generated within the extension, please let us know and we'll
> update the extension.
(Answered by MaxSem)

> We do request only the parts of the history list that are required for the
> different operations.  The timegate needs the closest match, first, last,
> previous and next.  The timemap is a serialization of the set of versions of
> the resource, and thus requires the entire history list.

This unbounded query is retrieving all the revisions for the page.
$xares = $dbr->select( 'revision', array('rev_id', 'rev_timestamp'), array("rev_page=$pg_id"), __METHOD__, array('DISTINCT', 'ORDER BY'=>'rev_id DESC') );

Suppose we were visiting https://en.wikipedia.org/wiki/Main_Page which has 4104 revisions. Can you justify why you need all of them instead of just 3 or 4?



Also, it'd be helpful if you provided a public repository of the extension. You can request it to be hosted with the other mediawiki extensions in our repository. That'd help later for deployment.
Comment 15 Max Semenik 2012-04-23 18:43:39 UTC
(In reply to comment #14)

> Suppose we were visiting https://en.wikipedia.org/wiki/Main_Page which has 4104
> revisions. Can you justify why you need all of them instead of just 3 or 4?

[[WP:ANI]] is 620k revs.
Comment 16 Sumana Harihareswara 2012-04-25 03:12:59 UTC
Rob, in addition these comments from experienced MediaWiki developers, you'll find this guide helpful:

https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment
Comment 17 Harihar Shankar 2012-05-08 16:01:56 UTC
We have updated the plugin code to incorporate all but one change that had been suggested. We need your opinion on how to fix the timezone issue that the plugin introduces. We have found 3 ways to fix this issue and we would appreciate input on which is the best approach.

1) Use the DateTimeZone class http://www.php.net/manual/en/class.datetimezone.php to use the GMT timezone for every time function used by the plugin. The drawback with this approach is that this class is available in PHP version > 5.2.0. Even though this would work in Mediawiki servers, a lot of other wikis use older PHP versions. 

2) Save the default timezone set for a wiki by using the getTimezone() function, use setTimezone('GMT') when the plugin is invoked, and then set it back to the default timezone when the plugin is finished. This approach will work with all the versions of PHP.

3) The plugin can check if the datetimezone class is available and use it. Otherwise use method 2 above. 

Please advice us on what the best approach will be. 

Note: The new plugin code is not available for download yet.
Comment 18 Max Semenik 2012-05-08 16:09:31 UTC
(In reply to comment #17)

> The drawback with this approach is that
> this class is available in PHP version > 5.2.0. Even though this would work in
> Mediawiki servers, a lot of other wikis use older PHP versions. 

All supported MediaWiki versions require 5.2.3, while the next release will require 5.3, so this is not a problem.
Comment 19 Harihar Shankar 2012-05-31 23:34:10 UTC
We have updated the plugin to incorporate the changes suggested above. 

The major fix is in the timemaps, where we were trying to fetch all the revisions of an article earlier. Now, we have introduced paged timemaps, where an optional configuration parameter can be set to limit the revisions that can be fetched. If this variable is not set, the number of revisions default to 500.
Comment 20 Platonides 2012-06-08 22:23:24 UTC
Adding the link to the bug report so I don't need to go looking for it each time http://mementoweb.org/tools/wiki/memento.zip

It would be easier to check if the changes were available in some repository.

* No need of mmSetupExtension, set $wgHooks in the global scope.

> $title = preg_replace('/ /', "_", $objTitle->getPrefixedText());
Used in several places. preg_replace() here is overkill, as str_replace() would do it. But in this case you'd just want $objTitle->getPrefixedURL()

* Coding conventions: Tabs instead of spaces, spaces into brackets...

* No need to make the SELECT rev_id rev_timestamp FROM revision a DISTINCT one, as those combined fields are always unique (the db server is probably realising and ignoring the distinct, but no need to add it).

* Generation of the link header is wrong. Eg. in the way it is generating the article url. You're not doing any rawurlescaping, so some specially crafted article names could confuse memento clients.

* Minor: give the author names in $wgExtensionCredits in an array.

* Unneded space indenting in timegate/timegate.alias.php, timegate/timegate.i18n.php, timemap/timemap.alias.php, timemap/timemap.i18n.php and top of timemap/timemap.php


>        if( stripos($par, $wgServer.$waddress) == 0 ) {
>            $title = preg_replace( "|".$wgServer.$waddress."|", "", $par );
Wrong check and wrong regex.

> $dbr->begin();
There is never a corresponding commit() or rollback().
You're doing an exit in the bottom, the php driver might be closing the transaction or even the connection, but don't rely on it.
(several places)


* wfLoadExtensionMessages() is deprecated since 1.15, expected to be removed in 1.20 (just remove that line).

* Variable $wgMementoReqDateTime defined as global in execute(), set as local variable in tgParseRequestDateTime() and never used.

* tgParseRequestDateTime() should use wfTimestamp() instead of strtotime() -> date()

* When you have a title object, you don't need a manual query to revision table to get the latest revision, just use getLatestRevID().

* You have repeated code for selecting the first/prev/next of a revision. I think it could be abstracted in a single function.

* TimeGate methods use a tg prefix that isn't really needed (you're scoped by the class name).

* At TimeGate::tgGetMementoForResource if there's no revision for the given memento, it will merrily use the undefined variable $memRevUnixTS generating wrong SQL. Maybe you wanted to abort with an error message if there's no suitable memento?
(even with the $oldestRevUnixTS / $recentRevUnixTS, there could be a race condition)


It's in better shape than the previous version :)
Comment 21 Platonides 2012-06-09 12:45:16 UTC
Also note that WikiPage has a getOldestRevision() method...
Comment 22 Harihar Shankar 2012-06-12 23:35:51 UTC
* Most of the suggestions above have been implemented. 

* getOldestRevision() and getLatestRevID() was not used for now because the plugin also needs latest and oldest timestamp with the revid. Is there a function that gives us these timestamps as well?

* The code is now in a GitHub repository. https://github.com/hariharshankar/mediawiki
Comment 23 Sumana Harihareswara 2012-07-25 18:12:22 UTC
Harihar, thanks for your update and for incorporating these revisions.

Per https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment , I'm cc'ing Howie Fung and Brandon Harris to help get a design review for this proposed functionality.

Also, to get your extension deployed, you will need to move your extension from GitHub to our Git system, which is hosted at gerrit.wikimedia.org .  Instructions: https://www.mediawiki.org/wiki/Git/New_repositories .

(In reply to comment #22)
> * getOldestRevision() and getLatestRevID() was not used for now because the
> plugin also needs latest and oldest timestamp with the revid. Is there a
> function that gives us these timestamps as well?

Were you able to find this information on your own?
Comment 24 Brandon Harris 2012-07-25 18:31:25 UTC
Is there any kind of consensus anywhere to get this deployed?  I've never heard of this until just now.
Comment 25 Bawolff (Brian Wolff) 2012-07-25 18:45:11 UTC
(In reply to comment #24)
> Is there any kind of consensus anywhere to get this deployed?  I've never heard
> of this until just now.

No there is not.

Quite frankly I think consensus should be established for people wanting this extension before resources are spent on improving it.
Comment 26 Dario Taraborelli 2012-07-25 18:53:23 UTC
Agreed, a thread on https://en.wikipedia.org/wiki/Village_pump_(technical) would be a good place to start gauging consensus.
Comment 27 Max Semenik 2012-07-25 18:54:59 UTC
You mean https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)
Comment 28 Dario Taraborelli 2012-07-25 19:17:38 UTC
Max, it's a redirect.

Here's a summary from a discussion from wikimedia-dev and further notes on the next steps for the Memento maintainers:
* start gauging community consensus (explaining the benefits of Memento support for editors and readers), see c24-27 above (BTW I imagine bots and third party apps would also be among potential target users, correct?) 
* get access to gerrit.wikimedia.org to prepare for code review, see c23 above
* if possible get us some estimates on the target user base and the current state of browser support
Comment 29 Bawolff (Brian Wolff) 2012-07-25 19:33:13 UTC
(In reply to comment #28)

> * if possible get us some estimates on the target user base and the current
> state of browser support

Umm, non-existent?  (Some plugins, nothing native) We are talking about non-standard http headers. (From what I can tell, draft RFC, and even if they do manage to get an RFC published, it honestly seems something unlikely to be adopted by the browser community). Really where talking about something on the level of the "browser edit button" minus the links to wiki idealogy.

That said it would be kind of a cool thing, provided the effort was minimal.

Most interest from the community I imagine would be more about doing bug 851, which has a fair bit of interest.
Comment 30 Chad H. 2012-09-04 01:17:43 UTC
(In reply to comment #23)
> Also, to get your extension deployed, you will need to move your extension from
> GitHub to our Git system, which is hosted at gerrit.wikimedia.org . 
> Instructions: https://www.mediawiki.org/wiki/Git/New_repositories .
> 

Repo was just created, but not with direct pushing permissions (since this is looking for WMF deployment, direct pushing is not allowed).
Comment 31 Dario Taraborelli 2012-10-24 04:13:59 UTC
Passed RFC on enwiki, status updated in the review queue  
http://www.mediawiki.org/w/index.php?title=Review_queue&diff=596956&oldid=592603
Comment 32 Harihar Shankar 2012-10-24 17:21:16 UTC
The memento extension code has been moved to the git repository and is waiting for review. 
https://gerrit.wikimedia.org/r/29812
Comment 33 Tim Starling 2013-02-05 13:02:30 UTC
I recommend rejecting this.

Asking users to install a Firefox extension to make navigation easier is not how I imagine a secure and user-friendly web would work. 

Perhaps if this were supported by unmodified browsers, it would be more attractive for us. The browsers have a long history of introducing features in advance of their use on the web, so I don't think it's a "chicken-and-egg" problem.

TimeGate responses, as specified by the Internet-Draft, appear to be effectively uncacheable with currently used HTTP proxy software. We have no way to remove resources from a cache with a finer granularity than a URI. So when the page is changed, we would have the choice of either:

* Purging the TimeGate URI when the page is changed, in which case all versions of that resource would be simultaneously purged, reducing the hit rate for rarely-accessed old revisions, or 
* Not purging it, in which case responses for recent Accept-Datetime values would become stale. Also, there would be no way to purge revisions which are removed from the database by RevisionDelete.

Additionally, the definition of the Vary header in the Internet-Draft appears to conflict with the definition in HTTP (RFC 2616), as implemented by MediaWiki, PHP, Squid, etc. It's unclear what the "negotiate" value is for or how it will interact with the Vary header values that MediaWiki must send to HTTP proxy servers.

The Internet-Draft seems to unnecessarily overspecify server and client behaviour. For example, depending on the server software, it may be difficult to implement the requirement that TimeGates respond to request methods other than GET and POST with an HTTP 405 code.

(In reply to comment #29)
> That said it would be kind of a cool thing, provided the effort was minimal.

I don't think the effort would be minimal. The code quality is poor, and would suffer from a high rate of bit rot due to poor integration with the MediaWiki core. For example, mmAcceptDateTime() assumes $_GET['oldid'] will have a certain interpretation by the MediaWiki core, and sends header values corresponding to this interpretation, regardless of what MediaWiki decides to actually do with that parameter. The assumption is already incorrect and will become more incorrect over time.
Comment 34 Rob Sanderson 2013-02-07 15:48:05 UTC
Hi Tim,

The Memento team has carefully analyzed your feedback. We hope our
below response can convince you to change your opinion regarding
Memento support in Wikipedia and would very much appreciate further
communication regarding the matter.

Many thanks!

Rob


=> Problem 1:

Asking users to install a Firefox extension to make navigation easier
is not how I imagine a secure and user-friendly web would work.
Perhaps if this were supported by unmodified browsers, it would be
more attractive for us. The browsers have a long history of
introducing features in advance of their use on the web, so I don't
think it's a "chicken-and-egg" problem.

Response:

It's difficult to argue with this point. We would obviously much
prefer native adoption by browsers over a plug-in solution. But,
without a plug-in, there would be no way to demonstrate the cross-site
time travel capability introduced by Memento. Also, it is hard to see
what incentives browser manufacturers have to natively implement
Memento's datetime negotiation as long as there is no critical mass of
servers supporting it. Failed attempts to get the attention of Mozilla
and Opera support this consideration, but if you have experience otherwise,
then any assistance you might give would be greatly appreciated. At this
point, Memento enjoys growing adoption by web archives (Internet Archive, British Library Web Archive, UK National Archives) and it has the unanimous support from the International Internet Preservation Consortium. Adoption by
WikiPedia could help build the essential critical mass that, we think,
could give us the momentum to credibly approach browser manufacturers.
Given WikiPedia's track record as early adopters of innovative
technologies (as emphasized by editors in the RFC discussion re
Memento support), we were hopeful to have your support in working
towards establishing that critical mass.

======

=> Problem 2:

TimeGate responses, as specified by the Internet-Draft, appear to be
effectively uncacheable with currently used HTTP proxy software. We
have no way to remove resources from a cache with a finer granularity
than a URI. So when
the page is changed, we would have the choice of either:

* Purging the TimeGate URI when the page is changed, in which case all
versions of that resource would be simultaneously purged, reducing the
hit rate for
rarely-accessed old revisions, or
* Not purging it, in which case responses for recent Accept-Datetime
values would become stale. Also, there would be no way to purge
revisions which are removed from the database by RevisionDelete.

Response:

We very much share the concern of cacheability, as exemplified by the
Memento protocol responses for Original Resources and Mementos.
However, when it comes to TimeGates, the situation regarding caching
deserves some further consideration:

* RFC 2616 states, as quoted below, that 302 responses are by default not cached:
"A response received with any other status code (e.g. status codes 302
and 307) MUST NOT be returned in a reply to a subsequent request
unless there are cache-control directives or another header(s) that
explicitly allow it."
* Caching 302 responses from a TimeGate will yield marginal benefit, if any:
- Datetime negotiation values exist on a continuum unlike e.g. media
type negotiation for which values reside in a discrete set. In the
latter case, chances that a cache has an entry for a specific value
out of the (small) discrete set are significant. In the TimeGate case,
chances are dramatically lower, if not insignificant understanding the
size of the value space. For example, when only taking into account
day granularity, the value space for Wikipedia has cardinality of over
3650 (365 days * 10+ years). Adding hours, minutes, and seconds to the
value space brings this cardinality to over 365*10*24*60*60. Chances
for a cache hit become very small.
- The overhead on the server resulting from not caching TimeGate
responses remains reduced as responses only contain headers without a 
representation in the body. Please see for example
http://www.mementoweb.org/guide/rfc/ID/#a200-step4-http

======

=> Problem 3:

Additionally, the definition of the Vary header in the Internet-Draft
appears to conflict with the definition in HTTP (RFC 2616), as
implemented by MediaWiki, PHP, Squid, etc. It's unclear what the
"negotiate" value is for or how it will interact with the Vary header
values that MediaWiki must send to HTTP proxy servers.

Response:

We see no conflict with the Vary definition of RFC 2616 as it states
the following about the field names used in Vary:
"The field-names given are not limited to the set of standard
request-header fields defined by this specification."
Furthermore, the "negotiate" value for Vary has become widely used
since its introduction in RFC 2295  that details Transparent Content
Negotiation. The "negotiate" value is used by default for negotiated
responses by Apache servers.

However, we agree that the "negotiate" value serves no real purpose
without the corresponding Negotiate request header and can be regarded as a
remnant of the early days of Memento during which RFC 2295 was a
significant inspiration. We are most willing to remove this value from
the Vary header in the Memento protocol and hence also from the
MediaWiki plugin.

=====

=> Problem 4:

The Internet-Draft seems to unnecessarily overspecify server and
client behaviour. For example, depending on the server software, it
may be difficult to implement the requirement that TimeGates respond
to request methods other than GET and POST with an HTTP 405 code.

Response:

The concern regarding HTTP 405 is fair and we would be most willing to
remove this requirement from the specification. Other feedback
regarding instances of overspecification would be very welcome as we
could take them into account when wrapping up the Internet Draft. From
our perspective, we have tried to clearly detail a variety of existing
and anticipated situations in a consistent manner, trying to redact a
specification that really helps implementers. But, in our enthusiasm,
we may have gone overboard, indeed.

=====

=> Problem 5:

(In reply to comment #29)
> That said it would be kind of a cool thing, provided the effort was minimal.

I don't think the effort would be minimal. The code quality is poor,
and would suffer from a high rate of bit rot due to poor integration
with the MediaWiki core. For example, mmAcceptDateTime() assumes
$_GET['oldid'] will have a certain interpretation by the MediaWiki
core, and sends header values corresponding to this interpretation,
regardless of what MediaWiki decides to actually do with that
parameter. The assumption is already incorrect and will become more
incorrect over time.

Response:

This comment regarding poor software quality comes as a big surprise
as we have invested very significant resources to improve the initial
code base, through many iterations, in response to feedback from
MediaWiki people. This is the first time we hear about the false
assumption re mmAcceptDateTime(). Our developer Harihar Shankar states
the following with this regard:
"I am determining if the current resource is a version of an article
by looking at the URL and check if there is "oldid" in it. This is
definitely not the best way to do it, but I looked extensively in
their documentation and I could not find a better alternative. This
issue has not been brought up by the code reviewers so far."
We would be very interested in learning what the appropriate approach
is. And we are interested in hearing about other problems with the
code. In both cases, we will be most happy to make required changes to
bring the code to the desired quality level.
Comment 35 Rob Sanderson 2013-02-26 01:55:01 UTC
Dear all,

We've tried to take the feedback from the bug into account, and have released a new version of the Internet Draft that makes things easier to implement, and with more implementation patterns, for content management systems like wikis.  It's much shorter as well to define only the necessary aspects rather than everything that might be nice to have.

The new draft is:  http://tools.ietf.org/html/draft-vandesompel-memento-06

I hope this further reduces the concerns for the extension.
Thanks in advance for any further comments.
Comment 36 Greg Grossmeier 2013-03-19 16:09:19 UTC
Hello Rob,

I'm Greg Grossmeier, Release Manager for the Wikimedia Foundation (basically, manager for deployments of Mediawiki and extensions to our servers that host all WMF projects).

I just wanted to take a moment and say thank you for your effort on this extension thus far. You and your team have put a lot of good faith effort into it and I/we appreciate that.

Unfortunately, at this time, we're in the same boat as Mozilla and Opera: we need to see a tangible use case supported by a large (absolute, not percentage, necessarily) number of users. I, at least, generally agree with what you are attempting to do with Memento (I have a Library Science degree and worked on metadata stuff with the W3C and Schema.org while at Creative Commons), but the time needed to do this right at the WMF is too high for us right now with the current expected payoff; we're time and budget constrained just as much as any other non-profit and there are currently higher priorities items in our queue that directly benefit the Wikimedia community.

No reason this couldn't change in the future, but it would need to be something along the lines of at least one major browser supporting Memento.

Thanks for your understanding,

Greg
Comment 37 Tim Starling 2013-03-27 03:58:09 UTC
(In reply to comment #34)
> Also, it is hard to see
> what incentives browser manufacturers have to natively implement
> Memento's datetime negotiation as long as there is no critical mass of
> servers supporting it. Failed attempts to get the attention of Mozilla
> and Opera support this consideration, but if you have experience otherwise,
> then any assistance you might give would be greatly appreciated.

As I said, the browser manufacturers have a long history of implementing features in advance of their use on the web. For example, the lead taken by Firefox and Opera in the introduction of various HTML 5 features.

If you want to get Mozilla's attention, you could start by filing a bug: https://bugzilla.mozilla.org/enter_bug.cgi

> * Caching 302 responses from a TimeGate will yield marginal benefit, if any:

Indeed. The high cardinality of TimeGate requests is a problem for efficient implementation. It is possible to imagine a protocol for retrieval of historical revisions which would not have this problem.

> This comment regarding poor software quality comes as a big surprise
> as we have invested very significant resources to improve the initial
> code base, through many iterations, in response to feedback from
> MediaWiki people. 

The comments above show that the code quality started out being terrible. It has improved greatly. Now, it is only poor. It still has some way to go before it is acceptable for WMF deployment (even if it was something we wanted). 

> This is the first time we hear about the false
> assumption re mmAcceptDateTime(). Our developer Harihar Shankar states
> the following with this regard:
> "I am determining if the current resource is a version of an article
> by looking at the URL and check if there is "oldid" in it. This is
> definitely not the best way to do it, but I looked extensively in
> their documentation and I could not find a better alternative. This
> issue has not been brought up by the code reviewers so far."
> We would be very interested in learning what the appropriate approach
> is. And we are interested in hearing about other problems with the
> code. In both cases, we will be most happy to make required changes to
> bring the code to the desired quality level.

If the necessary interfaces really are missing, then the developer's response should be to introduce them. But I think using an ArticleViewHeader hook and calling getOldID() on the Article object passed to the hook would be a reasonable way to do it. Then the hook will only be triggered on actual views of ordinary wiki pages, and the oldid will be the same one used by Article.php, which would be an improvement.

$wgRequest should not be used at all, nor "new RequestContext". You can get what information you need from the Article methods. Instead of $wgOut, you can get an OutputPage object from $article->getContext()->getOutput(), and instead of $wgRequest, you can use $article->getContext()->getRequest().

Nothing should ever call exit(), including Special:TimeGate and Special:TimeMap. You can use OutputPage::disable() to customise the output.
Comment 38 SJ 2013-08-15 03:18:10 UTC
This seems cool. Any further updates on implementation by moz or other browsers?  Who are the major implementers of memento today?
Comment 39 Andre Klapper 2013-08-15 09:23:39 UTC
(In reply to comment #38)
> Any further updates on implementation by moz or other browsers?

I guess it's best if you asked Moz for that. :)
Comment 40 hvdsomp 2013-08-15 21:09:33 UTC
(In reply to comment #38)
> This seems cool. Any further updates on implementation by moz or other
> browsers?  Who are the major implementers of memento today?

Thanks for asking. This allows me to provide a general update regarding Memento activity:

* The Memento Internet Draft [https://datatracker.ietf.org/doc/draft-vandesompel-memento/] is currently in IETF ISE Review, on its way to become an RFC. From our perspective, after 4 years of spec-ing, testing, and soliciting feedback, the spec is now in its final shape. 

* We received funding from the Andrew W. Mellon Foundation to develop a more solid Memento MediaWiki add-on, taking into account the feedback received during the discussion of this bug report. This work is currently ongoing. As soon as a version is available we will share it, here and on the MediaWiki Developers list, to solicit further feedback. We remain hopeful that Wikipedia and MediaWiki installations will consider implementing it.

* The recent release of the Wayback software [https://github.com/internetarchive/wayback/tree/master/contrib/ia/global-wayback], used by the Internet Archive and many other web archives worldwide, is fully compliant with the most recent version of the Memento protocol specification. Many thanks to our friends at the Internet Archive for making this happen! 

* This recent release of the Wayback software is already operational at the Internet Archive. Some web archives (e.g. British Library, UK National Archives) run Wayback versions that are compliant with previous versions of the Memento protocol. It is expected that these web archives as well as other web archives that run a pre-Memento Wayback version will migrate to the new version in the months to come. 

* archive.is [http://archive.is], a "personal" web archive that does not use the Wayback software, recently implemented the Memento protocol. The turnaround time between us suggesting they support Memento and them finalizing the implementation was about 3 days. [http://ws-dl.blogspot.com/2013/07/2013-07-09-archiveis-supports-memento.html]

* We have not yet further pursued native browser support for Memento, mainly because (contrary to what Tim suggests) we feel that chances to achieve it are rather low as long as there is no broader server-side Memento support. Memento has very significant support in the web archiving community (see above, e.g. Internet Archive). But we feel support outside of the web archiving community, e.g. by CMS with solid versioning approaches, is essential too. This is why we are keen on Wikipedia/MediaWiki support. Anyhow, we are currently working on two separate Chrome plug-in implementations. A major goal of that work is to determine how to minimize the footprint required for Memento support in the browser as a means of maximizing chances of possible native adoption.

* We received funding from the Andrew W. Mellon Foundation for the Hiberlink project [http://www.lanl.gov/newsroom/news-releases/2013/July/07.16-enabling-time-travel.php]. The problem domain explored in Hiberlink is very similar to the one explored in the Wikipedia Link Backups work [http://www.gossamer-threads.com/lists/wiki/mediawiki/369855]. In Hiberlink, it is about pro-actively archiving resources that are referenced from scholarly papers; in the Wikipedia Link Backups work it's about archiving pages referenced from Wikipedia articles. In both cases, Memento can play a role to navigate from the referenced URI to the temporally appropriate archived version.

* More information is available via the Memento site [http://mementoweb.org]

Greetings,

Herbert Van de Sompel on behalf of the Memento team
Comment 41 Gerrit Notification Bot 2013-09-29 13:42:04 UTC
Change 29812 abandoned by Hashar:
(bug 34778) Extension Memento: Initial Submit

Reason:
Seems the extension is stalled https://www.mediawiki.org/wiki/Extension:Memento and there is not any will to have it deployed. Thus abandoning change.

Feel free to resubmit a new change with the current code if there is any.

https://gerrit.wikimedia.org/r/29812
Comment 42 Gerrit Notification Bot 2013-09-29 13:42:08 UTC
Change 32237 abandoned by Hashar:
(bug 34778) Extension Memento: Improvements after previous review.

Reason:
Seems the extension is stalled https://www.mediawiki.org/wiki/Extension:Memento and there is not any will to have it deployed. Thus abandoning change.

Feel free to resubmit a new change with the current code if there is any.

https://gerrit.wikimedia.org/r/32237
Comment 43 Gerrit Notification Bot 2013-09-29 13:42:12 UTC
Change 32238 abandoned by Hashar:
(bug 34778) Extension Memento: Improvements after review.

Reason:
Seems the extension is stalled https://www.mediawiki.org/wiki/Extension:Memento and there is not any will to have it deployed. Thus abandoning change.

Feel free to resubmit a new change with the current code if there is any.

https://gerrit.wikimedia.org/r/32238
Comment 44 Gerrit Notification Bot 2013-09-29 13:42:17 UTC
Change 32239 abandoned by Hashar:
(bug 34778) Extension Memento: Improvements after review

Reason:
Seems the extension is stalled https://www.mediawiki.org/wiki/Extension:Memento and there is not any will to have it deployed. Thus abandoning change.

Feel free to resubmit a new change with the current code if there is any.

https://gerrit.wikimedia.org/r/32239

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links