Last modified: 2012-09-19 18:13:49 UTC
Some user report that the cache have problem on zhwp.
They submit the edit. However, when the page load, it still show the old version of the page. In the past, only IP will have this problem. But now, all user have this problem. They need to use ?action=purge to fix it. If this does not fix, some user may think the edit cannot be submit.
Is this a recent changes on the setting of Wikimedia Wikis or it is related to the 1.17 upgrade? Also, see local discussion at
Does it happen to all pages or just to a few? Even if the page has only latin characters?
Once a page has been purged, do the next edit also need to be purged?
Roan says (via IRC) that this may be a flagged revs issue which is being reported sporadically:
> hexmode: Also, some wikis have been reporting that users see a
> stale version of the page immediately after submitting an
> edit. I suspect FlaggedRevs involvement because it only happens
> on some wikis, but I forgot which ones. I think at least enwiki
> and zhwiki reported it
After the 1.17 upgrade, this happens sporadically on all projects: English Wiktionary, Swedish Wikipedia, ... It is very irritating to the frequent editor and should have a high priority.
(In reply to comment #3)
> After the 1.17 upgrade, this happens sporadically on all projects: English
> Wiktionary, Swedish Wikipedia, ... It is very irritating to the frequent editor
> and should have a high priority.
Has it been observed on wikis that don't have FlaggedRevs?
I don't think en.wiktionary has FlaggedRevs. I have seen it happen several times there, both when I create new articles and when I add my view to a discussion. The stale version that is viewed after saving is either the non-existing page or the discussion before my addition. I haven't created any new articles on sv.wikipedia lately, but I have seen the same happen when I add to a discussion, and I'm pretty sure sv.wikipedia doesn't use FlaggedRevs.
r83513 may have fixed it.
(In reply to comment #6)
> r83513 may have fixed it.
That would explain it for anonymous users, but not for logged-in users. Is this happening for logged-in users as well?
I'm always logged in and it happens to me. Several times today (March 9).
The following might be related: When I added a reply to a thread in Liquid Threads on en.wiktionary, the page (Special:Newmessages) was immediately updated, but a few seconds later a notice appeared that "this thread has new messages, click here to update", but of course my own reply was the only news. I have not seen that behaviour in Liquid Threads before version 1.17.
The page view immediately after edit submission should always be a squid cache miss, because a session cookie should be sent when you view the edit page, which suppresses caching. It doesn't matter if you're logged in or not.
In theory, page_latest should be loaded from the master, to prevent this bug from happening. Loading page data from the master was introduced in r7615 for this reason. It may have been broken as early as r12680, when the redirect check in Wiki.php was modified to load page data from a slave.
If you load page data from a slave, then it seems like the only thing standing in the way of this bug is the ChronologyProtector. That's surprising, because it's been broken on several occasions since r12680. I would have expected it to have been reported more often.
ChronologyProtector has a timeout of 10 seconds. If the timeout is reached, the lagged slave will be used, and page_latest may be incorrect. Perhaps the collation updates are causing slave lag of more than 10s but less than 30s, causing this bug to be seen.
*** Bug 27964 has been marked as a duplicate of this bug. ***
(In reply to comment #9)
> ChronologyProtector has a timeout of 10 seconds. If the timeout is reached, the
> lagged slave will be used, and page_latest may be incorrect. Perhaps the
> collation updates are causing slave lag of more than 10s but less than 30s,
> causing this bug to be seen.
The first few hours, it was indeed causing such slave lag. However, as soon as I fixed LoadBalancer::waitForAll(), I don't think it ever caused more than 1-2 seconds lag (it updates 20 batches of 50 rows, then waits for all slaves to catch up to the master position). Also, the collation update script has now finished on all wikis except enwiki.
Reading page data from a slave, if that's what's going on, is broken behavior and should just be fixed.
This has happened to me a number of times today. I had to purge the server cache. It's been happening several times per day for two or three weeks. Before that it was very rare. I've been editing Wikipedia articles daily for more than eight years, so I notice when something like this changes abruptly. It did. A discussion is going on at: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Purge
It just occurred to me that for this to happen, the parser cache entry generated on save would have to not be used for the subsequent page view. That's much more likely to have changed in the 1.17 deployment than the replication handling.
Just to confirm this is still happening to me on the English Wikipedia and on Meta.
I would think this has to be considered quite unfriendly to newbie editors, since they don't know about purging the cache and are told that their edits will be visible immediately. For that reason I think this should be considered a high-priority item.
Tim (#13) is obviously on the right track. There is no good reason for a wiki to return anything else than the updated page when a new edit is submitted. All involvement of stale caches must be a mistake. If the update failed to invalidate the cache, it is other users that would see the stale cache and not the user who submitted the edit.
This is caused by r79122. I will revert it.
The revert is deployed. I am still logging some cache regenerations after save, although much less than before. It may be that memcached read timeouts are interpreted as cache misses.
I gathered statistics for memcached read timeouts, there were 51 per second. Increasing the timeout reduced the error rate to 3.4 per second. There's probably a network issue which still needs to be figured out, but this bug should happen very rarely now. Please reopen if you see it on more than one in every 10k edits.
I haven't counted my edits, but I doubt they are more than 10K in the last week. The bug happened to me some minutes ago when I saved this edit to the sv.wikipedia Village pump,
After saving, what I saw was the version before my addition.
I haven't counted my edits, but they are certainly fewer than 10,000 since March 14. This bug has happened to me several times since then, generally (maybe even _only_) on page creation, including most recently when I created http://en.wiktionary.org/w/index.php?title=%D7%90%D7%9C%D7%99%D7%9B%D7%9D&oldid=12752869
Due to Comment 19 by Tim Starling (q.v.), Comment 20 by Lars Aronsson, and my own comment here, I'm reopening.
It has started happening for me again too. It stopped when Tim made his fix of March 14. But in the last few days it has happened a few times, particularly with page creation.
I know this was just reopened today, but Erik pointed to this issue:
Quote: "Replication lag is causing new users to be baffled by where
their edit disappeared too, and this is on articles without pending
changes. I found one user tonight who out of desperation was pasting
the displayed preview into the edit box, completely destroying the
article in the process, and was then surprised to get blocked. This
isn't just an inconvenience; it's costing us new editors!"
Just noting that this delay is still being experienced -- see two comments today:
(In reply to comment #24)
> Just noting that this delay is still being experienced -- see two comments
Aye, I hit this problem (or a very similar one) yesterday. I created a redirect on the English Wikipedia. After submitting the edit, the site returned a page as though no page had been created.
I had the problem too.
For new pages, we rely on ChronologyProtector since the parser cache isn't checked if the article appears to be non-existent. ChronologyProtector appears to be broken, due to r72475. No ChronologyProtector key appeared in the session data.
Temporary patch applied in r87235 and deployed, should be fixed now.
Created attachment 8506 [details]
Shutdown the current LBFactory before doing any jobs
Attached not so thoroughly tested patch. Does this look like it would work to you, Tim?
(In reply to comment #28)
> Created attachment 8506 [details]
> Shutdown the current LBFactory before doing any jobs
> Attached not so thoroughly tested patch. Does this look like it would work to
> you, Tim?
I'm not very keen on it. I'd prefer to see things done properly. For my ideas on what "doing it properly" would mean, see:
Maybe we could use this patch if that idea were not possible due to time constraints.
Is any more work needed on this?
*** Bug 29552 has been marked as a duplicate of this bug. ***
Patch reverted; should be good now.