Last modified: 2014-08-29 08:26:42 UTC
In theory, we perform a lazy 1-second granularity timestamp check to detect conflicts, then a second test is performed atomically during updateRevisionOn. However, the atomic test is broken because getLatest is returning the *new* latest id rather than the oldid stored at the time the user opened the article edit page. This can be easily confirmed by disabling the timestamp test at EditPage.php line 1613. My initial guess is that we were relying on the hidden "oldid" form parameter, but this always has a value of "0".
Change 94584 had a related patch set uploaded by Adamw: (bug 56849) Pass true oldid for atomic page save https://gerrit.wikimedia.org/r/94584
The solution should go beyond the above patch. Ideally, we remove the cheesy timestamp check and rely entirely on an atomic update attempt. If that fails due to a conflict, perhaps WikiPage::doEditContent falls back to 3-way merge (moving that code down from EditPage), and then we attempt a second, atomic update.
Ideally we'd also avoid the SELECT FOR UPDATE (followed by the parsing) and just do CAS style logic at the last second. The advantage would be that if a page takes 10 seconds to render and two people save at the same time, it would take about 10 seconds for each rather than 20 right now (people have complained about it taking forever to even get edit conflict pages).
Since this has been linked to bug 57022, I tried applying the patch to my 1.21.2 wiki. Did not fix the edit conflict problem on this wiki.
A minor detail to add: we should always include the oldid in GET query parameters to action=edit.
Patch didn't pass jenkins-bot. Anybody planning to pick up the work on this again?
Plus wondering if fixing this would influence bug 53646, bug 53446, bug 42163.
@aklapper: Thanks for linking the related bugs! I'm expanding my earlier patch to clean up oldid handling and implement compare-and-swap as @aschulz suggests. I expect that will resolve bug 42163, but for the other two I can't say. It would be very useful to have unit tests to detect improvement or regression of all four bugs.
I came across something disturbing while exploring the oldid fix: apparently, there is a 'redirect' parameter to the edit action that allows a page save to change the contents of the redirected page, rather than the contents of the redirect article passed as the 'title' parameter to edit. Maybe I'm being willfully blind, but I cannot imagine a case where this is a good idea. This seems like a nasty hurdle to reimplementing oldid, because a process making a 'redirect' edit will not necessarily have any information about the revision level of the redirect target. These cases would have to fall back to the badly flawed timestamp check.
Aha: so the 'redirect' thing was introduced to satisfy bug 24330, but along the way the "appendtext" or "prependtext" requirement was lost in implementation, commit 9f025aeb. IMO this is a misfeature, I'll write a separate patch to... break API backwards-compatibility... as part of this work.
@ASchulz: It looks like the CAS logic you suggested has already been implemented, in WikiPage::updateRevisionOn(). I'm trying to provide the $lastRevision param whenever possible, which should improve many cases. There is another function updateIfNewerOn which does require two queries, but it's only used in SpecialUndelete and Import, and IMO should be retired. Was there another query you know of which would explain the long load during a conflict?
I'm still working on the main patch, but these predecessors are ready for review: https://gerrit.wikimedia.org/r/#/c/133955 (also tracking under bug 24330) https://gerrit.wikimedia.org/r/#/c/134045 https://gerrit.wikimedia.org/r/#/c/134046 https://gerrit.wikimedia.org/r/#/c/134047 https://gerrit.wikimedia.org/r/#/c/134049
Change 134049 had a related patch set uploaded by Adamw: (bug 56849) Deprecate dangerous edittime-based content update functions https://gerrit.wikimedia.org/r/134049
Change 134049 merged by jenkins-bot: (bug 56849) Deprecate dangerous edittime-based content update functions https://gerrit.wikimedia.org/r/134049
(In reply to Adam Wight from comment #13) > I'm still working on the main patch, but these predecessors are ready for > review: > > https://gerrit.wikimedia.org/r/#/c/133955 (also tracking under bug 24330) > https://gerrit.wikimedia.org/r/#/c/134045 > https://gerrit.wikimedia.org/r/#/c/134046 > https://gerrit.wikimedia.org/r/#/c/134047 > https://gerrit.wikimedia.org/r/#/c/134049 For the records, all patches merged or abandoned. Not sure about the patch in comment 1 which didn't pass Jenkins.
(In reply to Andre Klapper from comment #16) > For the records, all patches merged or abandoned. > Not sure about the patch in comment 1 which didn't pass Jenkins. Adam: What's the status here? Is more work needed, or can this be considered fixed?
This patch is mean to fix the issue: https://gerrit.wikimedia.org/r/#/c/94584/ Everything else has been a preliminary... Sorry!