Last modified: 2010-11-18 01:28:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27289, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25289 - Make review load faster by speeding up display of old revisions
Make review load faster by speeding up display of old revisions
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
FlaggedRevs (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Priyanka Dhanda
:
Depends on: 24124
Blocks: 25293
  Show dependency treegraph
 
Reported: 2010-09-24 21:16 UTC by Rob Lanphier
Modified: 2010-11-18 01:28 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Changes to DifferenceInterface::renderNewRevision() (1.20 KB, patch)
2010-10-05 18:10 UTC, Priyanka Dhanda
Details
Proposal 2 (1.14 KB, patch)
2010-10-05 20:50 UTC, Platonides
Details

Description Rob Lanphier 2010-09-24 21:16:23 UTC
Currently, the review process on complicated pages gets bogged down by the fact that we need to display the (uncached) old revision underneath the diff.  There are two ways we can think of to address this problem:
1.  Work out a caching mechanism for old revisions
2.  Decouple rendering of the review buttons + diff from the rendering of the old revision (e.g. by rendering the old revision in an iframe)
Comment 1 Platonides 2010-09-24 21:58:22 UTC
Is it just one revision? Ie. the previous-to-last revision.
In such case we could also store the parsed page for it. Not that we couldn't do that for all and every revision, but I'm afraid that doing so would be too much for the available memcache storage (for the benefit given).

For option 2, we could default to showing just the diff, and provide a 'show content' link for AJAX fetching it.
Comment 2 Rob Lanphier 2010-09-25 04:51:33 UTC
For option 1, we'd want to store probably more recent history than just the very latest.  We wouldn't need to store everything, but the tough part is that I don't think we've got a natural way of invalidating the old content, short of implementing a garbage collector.  

I suppose one alternative to garbage collection would be pick an arbitrary number of cache slots, and then modulo the revision number.  For example, if we decided we were only going to store a maximum of the last 5 million revisions [a little less than a month on enwiki at current rates], and oldid was 4006000123, we would modulo oldid by 5 million and use slot 1000123 as the place to store the cached version.  Then when we went to cache 4011000123, we would put it in 1000123, overwriting 4006000123.  I'm guessing even a few hours worth of history caching would buy us a lot of parser performance benefit.

For option 2, we would probably use Javascript to fetch the old revision, but we wouldn't make the user click the link.  We'd just automatically request it.  The benefit would be that the diff and the buttons would render right away, and then the rest would load in its own sweet time, but the reader would probably be distracted reading the diff long enough to load the old revision most of the time anyway.
Comment 3 Platonides 2010-09-25 17:49:05 UTC
It would have an expire time, it's not for garbage collecting need but to avoid having copies of all parsed revisions, which would force memcached to expire prematurely other items.
Maybe it would be practical to store every old parsed revision for a few hours, though.

Can you explain the steps taken? I'm not familiar with flaggedrevs reviewing procedure.

Is it some special (or much commonly used) option, from "last-sighted revision" or just the normal procedure of using the history page?
Comment 4 Rob Lanphier 2010-09-27 05:50:17 UTC
Re: my modulo idea; that was borne of my ignorance+forgetfulness of how our caching works.  I'm not sure exactly how things are configured, but since tghe parser cache uses memcached (right?), and memcached already uses LRU and maybe even expiration times to dump things out of the cache, there's probably not the need for additional complexity in the app.  

Re: steps taken.  The FlaggedRevs extension adds extra bits to the history, and some extra UI to diff pages.  One typical review flow involves:
1.  Go to the list of pages with pending changes (Special:OldReviewedPages)
2.  Click "review" on one of the pages.  This action actually has the same effect as requesting a diff between the latest accepted revision, and the latest revision
3.  Either "accept" the revision, or (in the next version), "reject" it

Any diff page is actually a potential review page under Pending Changes.

Full details here:
http://en.wikipedia.org/wiki/Help:Pending_changes
Comment 5 Roan Kattouw 2010-09-27 09:12:08 UTC
(In reply to comment #4)
> Re: my modulo idea; that was borne of my ignorance+forgetfulness of how our
> caching works.  I'm not sure exactly how things are configured, but since tghe
> parser cache uses memcached (right?), and memcached already uses LRU and maybe
> even expiration times to dump things out of the cache, there's probably not the
> need for additional complexity in the app.  
> 
There's actually two different layers of caching you're dealing with here: revision cache and parser cache. Both live in memcached.

The revision cache caches the wikitext of every revision that's ever obtained from the external storage and caches it for 7 days. There is no discrimination between old and new revisions here, other than that LRU will most likely advantage newer revisions.

The parser cache caches the parser output (rendered HTML plus metadata like links, used templates and used images) for a variety of parser options (e.g. whether to show [edit] links, whether to use weird redlinks with a ? at the end, etc.) but only for the latest version of each page. The expiry time will be an hour or so if time-dependent magic words like {{CURRENTDAY}} are involved, but is generally set to never. When a page is visited for the first time after having been edited, MW will notice the parser cache entry is stale (because it's older than the newest revision) and reparse the page. As a consequence, ParserCache::get() can only fetch the latest revision of a page.

I believe you're right that memcached uses LRU to evict things from cache when it runs out of space. Whether storing old revisions in the parser cache (possibly forever) is desirable and whether additional complexity in MW is needed to do this sanely is a question I think is best answered by Tim (CC).
Comment 6 Bawolff (Brian Wolff) 2010-09-27 11:42:17 UTC
(In reply to comment #4)
> 
> Re: steps taken.  The FlaggedRevs extension adds extra bits to the history, and
> some extra UI to diff pages.  One typical review flow involves:
> 1.  Go to the list of pages with pending changes (Special:OldReviewedPages)
> 2.  Click "review" on one of the pages.  This action actually has the same
> effect as requesting a diff between the latest accepted revision, and the
> latest revision
> 3.  Either "accept" the revision, or (in the next version), "reject" it
> 
> Any diff page is actually a potential review page under Pending Changes.
> 
> Full details here:
> http://en.wikipedia.org/wiki/Help:Pending_changes

Stupid question, when I am reviewing stuff on enwikinews (which i recognize has a slightly different set up then 'pedia) and click the [review] link from special:oldreviewedpages I get a diff between the current version and the stable version, with the most recent version of the page (which is in parser cache from what i understand) displayed underneath. At what point does an old revision get displayed to the user?
Comment 7 Platonides 2010-09-27 15:34:01 UTC
(In reply to comment #4)
> This action actually has the same effect as requesting a diff between the
> latest accepted revision, and the latest revision

FlaggedRevs::getPageCache() already keeps a cache of the last stable revision, although it's not used when presenting a diff with that *target* revision.

Note that Bawolff is right. When the users click the "Review" they will arrive at a diff which will display below the *current* version (even if they normally wouldn't per their preferences), for which the parser cache will be used if possible.
You can easily confirm in the source "Saved in parser cache with key enwiki:pcache:idhash:62676-0!1!0!default!!en!4"


(In reply to comment #5)
> The parser cache caches the parser output (rendered HTML plus metadata like
> links, used templates and used images) for a variety of parser options (e.g.
> whether to show [edit] links, whether to use weird redlinks with a ? at the
> end, etc.) but only for the latest version of each page. The expiry time will
> be an hour or so if time-dependent magic words like {{CURRENTDAY}} are
> involved, but is generally set to never. When a page is visited for the first
> time after having been edited, MW will notice the parser cache entry is stale
> (because it's older than the newest revision) and reparse the page. As a
> consequence, ParserCache::get() can only fetch the latest revision of a page.

A bit of nitpicking: Parser output is not "stored forever" but for $wgParserCacheExpireTime, whose default is one day but in WMF is set to two weeks in InitialiseSettings.php. Duesentrieb also added more fine grained expirings but I think the magic words don't benefit from it yet. And the ? or red link is luckily CSS magic, not a parser cache option.
Comment 8 Rob Lanphier 2010-09-27 16:46:58 UTC
(In reply to comment #6)
> Stupid question, when I am reviewing stuff on enwikinews (which i recognize has
> a slightly different set up then 'pedia) and click the [review] link from
> special:oldreviewedpages I get a diff between the current version and the
> stable version, with the most recent version of the page (which is in parser
> cache from what i understand) displayed underneath. At what point does an old
> revision get displayed to the user?

Not a stupid question at all.  We thought we pretty much fixed this problem when we fixed bug 24124, but it turns out this feature causes people to cruise the history a lot.  For example, if there are two or more revisions after the latest accepted revision, reviewers are expected to inspect each revision rather than accepting/rejecting the diff in totality.
Comment 9 Rob Lanphier 2010-09-28 01:21:41 UTC
Priyanka is going to look into option 2 (decoupled rendering).  We haven't ruled out taking a run at option 1 (old revision caching), but option 2 is going to be a big win even when there's a cache miss.
Comment 10 Platonides 2010-09-28 15:04:25 UTC
Note that just changing the diffonly=0 from the url of the Review link to diffonly=1 will give you a content-less diff.
Adding a "show content" button to diffonly revisions should be easy, it's just fetching it from the api.
Comment 11 Priyanka Dhanda 2010-10-05 16:05:29 UTC
I was thinking of modifying the difference interface to only return the diff without content if it does not find a parser cache. We could do this through a hook for now, only if Flagged Revs id enabled.

So in DifferenceInterface::renderNewRevision() if the parser cache is not set then run a hook to check whether or not to get the content . It could also return some alternative content for flagged revs.
Then the content can be loaded through some javascript and an api call.

Anyone has thoughts on this approach?
Comment 12 Platonides 2010-10-05 16:16:38 UTC
I am not sure you said what you intended to say. If they don't have any parser caching (default is to use db), everything will be "slow".
I would like to see DifferenceInterface::renderNewRevision() call the article, not embed more logic half copied from Article::view().
Also, I am afraid that the diffs sometimes showing the article and sometimes not could lead to confusion.
Comment 13 Rob Lanphier 2010-10-05 16:38:12 UTC
Hi Priyanka, I think checking for a cache hit seems like a fine idea, assuming it doesn't make the logic too convoluted.

Hi Platonides, here's what's going on.  Thanks to the fix in bug 24124, there will be times the parsed content is pulled from the parser cache rather than rendered directly.  In those cases, nothing would change.  However, if we have a cache miss, then the goal would be to render a placeholder (e.g. an empty div or even <div id="rendered_version">Loading...</div>), and then use Javascript to replace the div with rendered version.  The visual effect of this would be that the revision would always be rendered, but the rendering of the revision (which is often completely below the fold) would no longer block the much faster rendering of the diff (always above the fold).

Based on the last conversation I had with Priyanka, I think we all agree that removing the cut-n-pasted code from renderNewRevision would be a very good thing, but outside the scope of addressing this issue.
Comment 14 Priyanka Dhanda 2010-10-05 17:38:06 UTC
Platonides, I meant a parser cache hit for that revision assuming parser caching is enabled.
For simplicity sake though, it probably makes sense to consistently not return content and always get it through the api call.
Comment 15 Priyanka Dhanda 2010-10-05 18:10:21 UTC
Created attachment 7715 [details]
Changes to DifferenceInterface::renderNewRevision()

What I have in mind would look something like this. It would need introducing a new hook.
Comment 16 Platonides 2010-10-05 20:50:28 UTC
Created attachment 7718 [details]
Proposal 2

That hooks could only be used for disabling the content.

I would use something like this.

Note that the hook can block further output by just returning false, but can also take the DifferenceEngine object to provide a different output.
Comment 17 Priyanka Dhanda 2010-10-05 21:49:45 UTC
Thanks Platonides! I was going to check whether that is conventionally a better way of doing it. You answered my question :)
Comment 18 Rob Lanphier 2010-10-06 20:22:28 UTC
I just posted to the mediawiki-api list about this:
http://lists.wikimedia.org/pipermail/mediawiki-api/2010-October/001963.html

Skipping the intro:

> In talking to Sam this morning about this, a couple things became clear:
> 1.  This requires action=parse, which is already on the list of APIs
> that generate healthy load
> 2.  While this move should be net-neutral in CPU load, it does shift
> the load from general purpose Apaches to the API servers, the latter
> of which are more heavily loaded

> Additionally, it's theoretically possible that this will actually
> generate more load, since it'll be easier for people to skim the
> history, unintentionally generating lots of (never completed) API
> calls.

> How big of a problem is this?
Comment 19 Rob Lanphier 2010-10-26 00:51:07 UTC
Priyanka checked in code to address this in r75331 and r75332.  I believe there might be a loose end or two to tie up (e.g. the FIXME on r75332), but this is largely done.
Comment 20 Aaron Schulz 2010-11-18 01:28:03 UTC
Closing this. Specific bugs with that code will (and do) have their own reports.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links