Last modified: 2010-05-15 14:36:21 UTC
BUG MIGRATED FROM SOURCEFORGE http://sourceforge.net/tracker/?func=detail&aid=949323&group_id=34373&atid=411192 Originally submitted by IMSoP 2004-05-06 17:37 There have been numerous instances that I know of on the English Wikipedia of the entire contents of a page becoming duplicated - i.e. an additional copy of all text being appended to itself. This is especially problematic if it happens on large and busy utility or discussion pages, since it is often not spotted immediately and therefore leads to each discussion on the page being forked without anyone realising, and having to be carefully merged later. This appears to be caused by users attempting to submit more than one edit in competition with themselves, and specifically submitting the same change twice. Since large pages are likely to load rather slowly after editing, people *will* think their changes haven't gone through, and so click the submit button again - forum software often includes specific filters to overcome such multiple submissions. It possibly also interacts with section editing, since this presumably entails "construction" of the new page content from the form data and the existing version. When this was mentioned on the mailing list, Brion stated: "There is explicitly no edit conflict resolution between submissions by the same user." Clearly, the edit created in such circumstances is inappropriate, somehow concatenating two (probably identical) versions, rather than over-writing one with the other. The software needs to do at least one of: * Treat multiple submissions from the same user as a normal edit conflict (creating potential confusion if they just hit the same button twice) * detect multiple submissions which contain identical data, and silently accept one or the other * detect submissions in very quick succession, and flatten them into one edit (if they are by the same user) * at the very least, ignore such situations, as now, but in a sane way - i.e. use the content from one and only one edit submission, even if the edit was to a particular section Related mailing list posts: http://mail.wikipedia.org/pipermail/wikitech-l/2004-April/009752.html http://mail.wikipedia.org/pipermail/wikitech-l/2004-April/009750.html -- IMSoP [http://en.wikipedia.org/wiki/User:IMSoP] ---- Additional comments (in reverse order) ---- Date: 2004-06-28 22:06 Sender: robert_dodier Logged In: YES user_id=501686 Hello, another bit that might help track down this bug -- [[vfd]] got duplicated sometime today (June 28 2004). Maybe by checking the editing history, it could be determined which edit yielded the duplication. Hope this helps. ------------------------------------------------- Date: 2004-06-22 00:18 Sender: wfmcwalter Logged In: YES user_id=1036616 It seems only to be with section editing. It's not confined to edits of new sections (although it may be _caused_ by the unrelated addition of another section). The duplicated section certainly isn't always the new one. It seems to happen more often when the system is slow, leading me to believe it relates to a user resubmitting an edit believing it to be "stuck". But that alone isn't sufficient to cause it. It happens occasionally (but frequently enough to be a problem) on heavily edited pages. Before it returned to a transclusion-based scheme, [[en:Wikipedia:Votes for deletion]] would exhibit this behaviour several times per day. It's going to be nearly impossible to obtain a reasonable idea what users did to precipitate matters, I'm afraid, because: 1. it seems to require interaction of two (or more) simultaneous editors 2. at the time it happens, neither is aware (both writes seem to succeed without an error) 3. by the time the error is discovered (often hours later) it's unlikely either submitter will be able to recall sufficient detail 4. clearly the window in which this occurs is tiny, making the chances of a manual attempt at reproducing it diminishingly small I figure the only way to be able to reliably reproduce it will be to set two (or more) bots on the same page, making section edits. ------------------------------------------------- Date: 2004-06-21 23:53 Sender: vibber Logged In: YES user_id=446709 Is this only in section editing? Is this only when adding new sections? [This has long been known to duplicate the added section on double submission, since it simply adds a new section to the end of whatever is there.] Is this when editing existing sections? [This is virtually guaranteed trouble as sections are numbered in a fashion that is liable to change.] Is this when editing whole pages? Can you reliably reproduce the problem? When you see it happen, please record *everything* you can. When the edit occured, whether it was by section or whole page, whether any edit conflicts were involved, how many times submitted, etc. ------------------------------------------------- Date: 2004-06-16 17:06 Sender: imsop Logged In: YES user_id=1053535 The introduction of edit conflict merging in 1.3 seems to have made this problem worse: Firstly, pages like [[en:VfD]] have gone back to being one large page, and in general slowness has been rearing its ugly head a lot. Since if something's going that slowly, people will be more likely to click save twice, this is triggering more instances of the bug. Secondly, some people are reporting problems with section editing, where sections seem to overwrite each other - see http://meta.wikipedia.org/wiki/MediaWiki_1.3_comments_and_bug_rep orts#edit_conflict_management_problem - which may or may not be due to the new code. Since, from Brion's comment, the behaviour in these conditions appears to be essentially "undefined", it seems to me that the new code could be interfering somehow and making the results even more confusing. Either way, this seems to be causing major problems, and needs to be fixed ASAP. -- IMSoP [http://en.wikipedia.org/wiki/User:IMSoP] ------------------------------------------------- Date: 2004-05-06 18:28 Sender: wfmcwalter Logged In: YES user_id=1036616 Here's one instance on en.wikipedia.org's [[Reference desk]] earlier today: Change log: Of these two transactions, the latter seemed to cause the duplication: m 15:20, 6 May 2004 .. Bodnotbod (=Wikipedia Talk and Google= how is suppression of Google indexing of VfD done?) m 15:26, 6 May 2004 .. Bodnotbod (=Wikipedia Talk and Google= how is suppression of Google indexing of VfD done?) URL for the problematic change: http://en.wikipedia.org/w/wiki.phtml?title=Wikipedia:Reference_de sk&diff=3472769&oldid=3472734
Note that bug 56 may or may not be related to or the same as this - that report suggests similar-sounding behaviour with two users editing different sections. It may be that there's just interaction between two bugs, or one may be a misinterpretation.
*** Bug 552 has been marked as a duplicate of this bug. ***
Another bug that may or may not be related: bug 317, where users report page *blanking*, but which it seems to me may also include getting into a conflict with oneself.
A self-edit conflict, or section blanking when there is no conflict are frequently reported on Wikicities when users have "Show preview on first edit" option selected in their preferences.
As I believe Brion remarked once in a wikitech-l conversation, there is code in the software to explicitly ignore any conflict with yourself. By ignore, I mean that it treats it as a non-conflict. The later submitted change will just drop itself on top of what was there. Can anyone think of a reason for this behavior? The only thing I can imagine is if a user has two windows open editting the same page. The user submits from one window, then realizes that the other window has better changes, and submits that. _If_ this is the only reason to have the code, I'd suggest forcing a normal edit conflict resolution. Perhaps in the process we'll fix this bug. -Rich Holton en.Wikipedia:User:Rholton
The purpose is so that someone who saves an edit, clicks "back", makes another change, and clicks "save" again won't receive an edit conflict message. We got a lot of complaints about that back in the day.
Perhaps not exactly the same issue, but there have been instances where I think entire articles are duplicated by users inappropriately responding to an edit conflict for a section edit by copying and pasting the entire article's content (shown in the "your changes" box) and submitting this as the new contents for the section they were editing. Yes, this is user error - BUT, it's happened enough times that I think the software should be changed so that when an edit conflict occurs for a section edit, only the section is shown.
(In reply to comment #7) > Perhaps not exactly the same issue, but there have been instances where I think entire articles are duplicated by users inappropriately > responding to an edit conflict for a section edit by copying and pasting the entire article's content (shown in the "your changes" box) > and submitting this as the new contents for the section they were editing. Yes, this is user error - BUT, it's happened enough times > that I think the software should be changed so that when an edit conflict occurs for a section edit, only the section is shown. A user has confirmed the above sequence as a mechanism resulting in duplicated content, please see http://en.wikipedia.org/wiki/ User_talk:Rick_Block#How_to_duplicate . Is there any particular reason the edit conflict page shows the entire article rather than just the section being edited? Seems like this should be a fairly simple fix.
(In reply to comment #8) > Is there any particular reason the edit conflict page shows the entire article rather than just > the section being edited? Seems like this should be a fairly simple fix. The problem with only showing one section in an edit conflict screen is that changes to other parts of the article could make the original section edit not make sense - for instance, the section might no longer exist, or have been moved, or the information one user was about to add to it has been added to another section by another user. All these situations require the user presented with the edit conflict to be able to see and manipulate the entire article, not just the section they originally elected to edit. I'm also unable to reproduce your analysis of the screen's behaviour - as far as I can see, an edit conflict screen presented when editting a section correctly displays the entire article in both boxes (for the reason explained above), and the resulting save correctly replaces the entire article text with just the text in the top box. So, unfortunately, the bug is not as simple as you are suggesting (i.e. it's a genuine bug, not a bad UI) The factors that all the examples I've seen have in common appear to be: * large pages - if anyone has an example that rules this out as a factor, it would be worth knowing about * edit conflict with self - with or without seeing the warning screen, which *should* be suppressed in this situation * editting a particular section, rather than the whole page - I'm still not 100% clear that this is always the case, but it seems a reasonable assumption It thus occurs to me that the following sequence of events would describe the behaviour of the bug: 1) user edits a section of a large page; the editted section is merged into the rest of the page and saved 2) same user edits same page in a way that would trigger an edit conflict; again, the section is merged in to create the page to save 3) when it overrides the edit conflict (because this is the same user), the software "forgets" that is has already merged the section, and mistakenly treats the new version of the page as the contents of a single section I have yet to come up with the exact circumstances under which this happens (and therefore can't reproduce the bug on demand for testing) - for all I know, it may involve very subtle coincidences of timing and/or some specific size of page, etc - but I think it's the most thorough hypothesis so far that fits the facts.
http://en.wikipedia.org/wiki/Wikipedia:Vandalism_in_progress seems to suffer from this kind of duplication very badly, partly because the highly compact style of writing means it's difficult to spot when it happens. I've twice in the last month fixed almost complete duplication of the page, in both cases after a whole week had passed. Duplication edits: http://en.wikipedia.org/w/index.php?title=Wikipedia:Vandalism_in_progress&diff=14316926&oldid=14315993 http://en.wikipedia.org/w/index.php?title=Wikipedia:Vandalism_in_progress&diff=next&oldid=15115600
I hate this bug. I hate it so much, that I sacrificed part of my life staring at the Mediawiki source to try to figure out what is causing it, and I think I have figured it out. In EditPage.php, function editForm, section "if ( 'save' == $formtype )": I believe it is possible to get through this branch with both $isConflict = True and $this->section != ''. If this occurs, then the edit conflict screen will place the full page's text in textbox1, but still have a hidden field stating that all of this text belongs as a replacement of only a single section. Hence, if someone then saves from this screen the entire page's content would be dumped into that single section, effectively doubling the content of the page. I believe the event that allows this to happen is a return of false from the call to $this->mArticle- >updateArticle(...), which can occur if there is a late edit conflict such that the database is updated BETWEEN when editForm calls $this->mArticle->getTimestamp() and when Article.php::updateArticle calls "$this->updateRevisionOn". Since updateArticle will fail even in self-conflicts (unlike editForm), the easiest way to trigger this would be to get in a race with onesself by trying to submit multiple times, though it is not neccesary that this be a self-conflict. Once updateArticle returns false for any reason, the only response is to set $isConflict = True. Unlike the earlier part of editForm, there is no code to ensure that $this->section is reset to '' before proceeding to the Edit Conflict screen. Hence, if one is A) performing a section edit, B) does not trigger an edit conflict in editForm, & C) does trigger a conflict in the slightly later updateArticle, then one arrives at the Edit Conflict screen with the section identifier still set and the possibility to save from this screen and dump the entire page's content into a single section of that page. The quick resolution is simple, if updateArticle return false, make sure $this->section = ''. (Though ideally, such late edit conflicts should loop back to the beginning to see if they can be resolved through merging or something similar). So, please fix this. -DF
(In reply to comment #11) > I hate this bug. I hate it so much, that I sacrificed part of my life staring at the Mediawiki source to > try to figure out what is causing it, and I think I have figured it out. Wow! I think you win the prize - at least for the most thoroughly analysed hypothesis! If you're right about the specific race condition, would it be possible to artificially simulate it - i.e. put huge pauses in the code at the point the second edit has to come. Would be great to replicate the bug on demand, and then be confident that it was in fact fixed.
Please fix this as a matter of urgency. This has made my life hell for the last 9 hours or so as I've tried to keep the page http://en.wikipedia.org/wiki/7_July_2005_London_bombings sane. I've had this bug about 20 or 30 times or so in that space of time, the only sensible solution being to revert to the last un-duplicated revision, sometimes wiping out dozens of edits.
I think I saw this bug just happen with my own edits at http://en.wikipedia.org/wiki/Wikipedia:Templates_for_deletion . I wanted to vote delete for two templates, so I right-clicked on both the edit links for the corresponding sections in quick succession. I'm using Firefox and right-clicking opens the link in a new tab. Then I went to the first tab, added my vote, and saved: http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19299850. Then I went to the second tab, added my vote, and pressed save. This was apparently also saved, see http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19299862, but I did get an edit conflict screen, with the complete page in the top edit box and only the section I was editing in the bottom edit box. I then copied the text in the bottom edit box to the top edit box and saved: http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19300028. I think that, if I hadn't copied the text, but just edited the top edit box, the text in the top edit would have been substituted for the section and the page would effectively be duplicated.
Created attachment 735 [details] Example of doubling bug edit conflict By accidentally double clicking the save button, I believe I have managed to capture an image of an edit conflict page that leads to the doubling bug. This is attached. If you look at the source, you will note that even though this in an edit conflict, it includes: "<input type='hidden' value="9" name="wpSection" />" indicating that section editting is still turned on, consistent with my hypothesis for how an entire copy of a page gets dumped into a single section's spot. Curiously, "my text" is also represented as only the material in the section being editted, rather than a new version of the whole page as is customary for edit conflicts. Hope this helps.
This bug should now be fixed per the analysis in comment 11. That fix was committed and synchronized around 15:20 UTC. Can you confirm that this happened afterwards and that it still happens?
(In reply to comment #16) > This bug should now be fixed per the analysis in comment 11. > That fix was committed and synchronized around 15:20 UTC. Can > you confirm that this happened afterwards and that it still happens? The file I uploaded showing the edit conflict dates from several days ago. I saved it at that time, but didn't get around to posting it till today. So, hopefully the fix you have now applied will have settled the issue.
Great. :) Marking this tentatively FIXED, for both the duplication and the diff display.