Last modified: 2014-07-04 00:50:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 57026 - Bot being served Error: 503, Service Unavailable (default error page) for API action=edit requests on large pages (caused by parsing the page three times on save)
Bot being served Error: 503, Service Unavailable (default error page) for API...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal major (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 59788
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-13 19:36 UTC by Cyberpower678
Modified: 2014-07-04 00:50 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Cyberpower678 2013-11-13 19:36:21 UTC
90% of the time, my bots are hitting a technical issue when trying to edit.  Here's one of thousands of instances it got by doing 3 hours of bot work.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Wikimedia Error</title>
<meta name="author" content="Mark Ryan"/>
<meta name="copyright" content="(c) 2005-2007 Mark Ryan and others. Text licensed under the GNU Free Documentation License. http://www.gnu.org/licenses/fdl.txt"/>

<style type="text/css"><!--
body {
background-color: #dbe5df;
font: 100% "Gill Sans MT", "Gill Sans", "Trebuchet MS", Helvetica, sans-serif;
margin: 0px;
}
.TechnicalStuff {
font-size: 0.8em;
font-style: italic;
text-align: center;
padding-bottom: 0.8em;
}
.Lines {
width: 100%;
height: 1px;
overflow: hidden;
font-size: 0.5px;
}
.ContentArea {
background-color: white;
padding: 0.8em 10% 0 10%;
font-size: 1.0em;
}
a:hover {
color: red;
}
h1, h2 {
margin: 0px;
font-size: 1.0em;
}
h2 {
background: #9fbfd8;
font-size: 1.2em;
font-weight: bold;
text-align: center;
}
h1 {
background: #dbe5df;
font: bold 1.5em "Gill Sans MT", "Gill Sans", Helvetica, Humanist, sans-serif;
text-transform: uppercase;
text-align: center;
width: 100%;
padding-top:0.8em;
}
-->
</style>
<script>/*<![CDATA[*/
	function lines(s) {
		var c = s.split(' ');
		for (var i = 0; i < c.length; i++) {
			document.write('<div class="Lines" style="background-color:#' + c[i] + ';"></div>');
		}
	}
//]]></script>
</head>

<body link="#24442E" text="#000000" vlink="#24442E" alink="#FF0000">
<h1>Wikimedia Foundation</h1>
<script>lines('ccd4cf bdc3bf adb1af 9ea09f dbe5df');</script>

<h2>Error</h2>

<script>lines('8f8f8f acacac c6c6c6 dbdbdb eaeaea f4f4f4');</script>

<!-- BEGIN CONTENT AREA -->
<div class="ContentArea">

<div id="en" lang="en">
<p>Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please <a href="//w/api.php" onclick="window.location.reload(false); return false">try again</a> in a few minutes.</p>
<p>The Wikimedia Foundation is a non-profit organisation which hosts some of the most popular sites on the Internet, including Wikipedia. It has a constant need to purchase new hardware. If you would like to help, please <a href="https://donate.wikimedia.org/?utm_medium=varnisherr&utm_source=20120
809SQ01&utm_campaign=errorpage">donate</a>.</p>
<hr noshade="noshade" size="1px" width="80%" />
<div class="TechnicalStuff">
If you report this error to the Wikimedia System Administrators, please include the details below.<br/>
</div>
<div class="TechnicalStuff">
<bdo dir="ltr">
Request: POST http://en.wikipedia.org/w/api.php, from 130.203.176.150 via cp1055 frontend ([208.80.154.225]:80), Varnish XID 2270366995<br/>Forwarded for: 130.203.176.150<br/>Error: 503, Service Unavailable at Wed, 13 Nov 2013 15:30:37 GMT
</bdo>
</div>
</div>

</div>

<div>
<p>Your cache administrator is <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=18903">nobody</a>.</p>
</div>
<script>lines('9ea09f adb1af bdc3bf ccd4cf');</script>

</body>
</html>

This is taken straight from the bot that received it.
Comment 1 Bartosz Dziewoński 2013-11-13 19:39:00 UTC
What page are you trying to edit? What data are you posting?
Comment 2 Cyberpower678 2013-11-13 20:13:05 UTC
(In reply to comment #1)
> What page are you trying to edit? What data are you posting?

Trying to edit User:Cyberbot I/Current AfD's

Trying to add: An updated version of it's contents.
Comment 3 Cyberpower678 2013-11-13 20:14:04 UTC
oops
Comment 4 Bartosz Dziewoński 2013-11-13 20:19:48 UTC
(In reply to comment #2)
> Trying to edit [[User:Cyberbot I/Current AfD's]]

Let me just note that this is a pretty large page (120K, takes 11s to render), which naturally doesn't mean that it should fail to save, and especially return the default error page for API requests.
Comment 5 Brad Jorsch 2013-11-13 22:29:28 UTC
Checking the API logs, I see several identical edit requests that all took 40-50 seconds to process from start to finish. And looking at the page history, I see your bot has runs where it makes an edit every minute with the only difference being the "This table was last updated on" timestamp.

So it seems that what's probably going on here is that your API request is taking too long to process so the proxy times out and gives you an error page. And then your bot just keeps hammering away trying to make the same edit. A better idea here would be to catch the 503, wait a minute or two, and then see if the edit eventually saved.
Comment 6 Cyberpower678 2013-11-14 12:41:03 UTC
(In reply to comment #5)
> Checking the API logs, I see several identical edit requests that all took
> 40-50 seconds to process from start to finish. And looking at the page
> history,
> I see your bot has runs where it makes an edit every minute with the only
> difference being the "This table was last updated on" timestamp.
> 
> So it seems that what's probably going on here is that your API request is
> taking too long to process so the proxy times out and gives you an error
> page.
> And then your bot just keeps hammering away trying to make the same edit. A
> better idea here would be to catch the 503, wait a minute or two, and then
> see
> if the edit eventually saved.

Even better was to not have it return 503 to begin with.  If the framework can't find a servedby parameter, or a requestid, it considers it a failed attempt.  I've also had another user come to IRC with the exact same problems, and this never was a problem before so clearly, what needs to be fixed is not on my end.
Comment 7 Cyberpower678 2013-11-14 12:43:45 UTC
(In reply to comment #5)
> Checking the API logs, I see several identical edit requests that all took
> 40-50 seconds to process from start to finish. And looking at the page
> history,
> I see your bot has runs where it makes an edit every minute with the only
> difference being the "This table was last updated on" timestamp.
> 
> So it seems that what's probably going on here is that your API request is
> taking too long to process so the proxy times out and gives you an error
> page.
> And then your bot just keeps hammering away trying to make the same edit. A
> better idea here would be to catch the 503, wait a minute or two, and then
> see
> if the edit eventually saved.

And to add on, if the edit is going through, then a human can see that, a bot clearly can't when being thrown an error message that can't be read by a computer program, since it's not an XML, PHP, or JSON formatted output.  Also, why is it taking 50 seconds to process?
Comment 8 Brad Jorsch 2013-11-14 13:57:45 UTC
(In reply to comment #6)
> If the framework
> can't find a servedby parameter, or a requestid, it considers it a failed
> attempt.

Then fix your framework.

(In reply to comment #7)
> 
> And to add on, if the edit is going through, then a human can see that, a bot
> clearly can't when being thrown an error message that can't be read by a
> computer program, since it's not an XML, PHP, or JSON formatted output. 

The bot should be able to detect the fact that the HTTP status code is 503. Then, just like a human might, the bot can wait a minute and then check if the edit showed up in the page history after all.

> Also, why is it taking 50 seconds to process?

I'll try to look into that.
Comment 9 Andre Klapper 2013-11-14 14:39:51 UTC
[Setting priority again to "normal". Please keep it there.]
Comment 10 Brad Jorsch 2013-11-14 17:32:40 UTC
Some data:
* Parsing the new page text to extract external links for SpamBlacklistHooks::filterAPIEditBeforeSave takes 11 seconds.
* Then that parse info gets thrown away by a call to Article::clear(), as part of the making sure there aren't edit conflicts.
* TemplateDataHooks::onPageContentSave needs a parse, too, so there goes another 11 seconds. And for some reason it passes null for the $serialization_format in the call to WikiPage::prepareContentForEdit, instead of the format actually being used for the parse.
* Then saving the edit into the database needs the parse info too. And since the saved parse info from TemplateDataHooks::onPageContentSave has the wrong serialization format, it can't be reused so there goes another 11 seconds.

And that seems to about cover it, my testing saving your page was taking about 33 seconds this morning. If things are overloaded, each of those 11 seconds could well be more.
Comment 11 Gerrit Notification Bot 2013-11-14 20:12:43 UTC
Change 95481 had a related patch set uploaded by Aaron Schulz:
Avoid extra parsing on edit

https://gerrit.wikimedia.org/r/95481
Comment 12 Gerrit Notification Bot 2013-11-15 00:17:36 UTC
Change 95481 merged by jenkins-bot:
Avoid extra parsing on edit

https://gerrit.wikimedia.org/r/95481
Comment 13 Gerrit Notification Bot 2013-11-15 00:18:59 UTC
Change 95519 had a related patch set uploaded by Aaron Schulz:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/95519
Comment 14 Gerrit Notification Bot 2013-11-15 23:32:59 UTC
Change 95519 merged by jenkins-bot:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/95519
Comment 15 Cyberpower678 2013-11-25 21:09:13 UTC
We are still having this problem on labs.

POST: http://en.wikipedia.org/w/api.php
API Error...

Code: error503
Text: HTTP Error 503
The webserver's service is currently unavailable, however, the edit appears to have gone through.

This is still getting returned by my framework (Peachy), after the suggested update I made to it.  Has this patch been reviewed and submitted yet?
Comment 16 Brad Jorsch 2013-11-25 21:11:44 UTC
Patches have been merged, as you can see above, and should go out with 1.23wmf5. See https://www.mediawiki.org/wiki/MediaWiki_1.23/Roadmap for the schedule.
Comment 17 Gerrit Notification Bot 2013-11-25 22:29:24 UTC
Change 97634 had a related patch set uploaded by Ori.livneh:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/97634
Comment 18 Gerrit Notification Bot 2013-11-25 22:55:40 UTC
Change 97634 merged by jenkins-bot:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/97634
Comment 19 Bartosz Dziewoński 2013-11-26 08:57:30 UTC
As I understand, this has been backported and deployed by Ori and Faidon yesterday.
Comment 20 Cyberpower678 2013-12-06 15:54:18 UTC
Still getting errors.  Will be happy to post what my framework is reporting.
Comment 21 Bawolff (Brian Wolff) 2013-12-13 05:10:59 UTC
I think there's still double rendering going on. On commons, a null edit takes just slightly more than double the time a page preview takes. On my local copy, the operations take roughly the same time (As they should).
Comment 22 Aaron Schulz 2013-12-13 05:28:09 UTC
I still see TemplateDataHooks::onPageContentSave mentioned in dbperformance.log
Comment 23 Bawolff (Brian Wolff) 2013-12-13 07:07:34 UTC
(In reply to comment #22)
> I still see TemplateDataHooks::onPageContentSave mentioned in
> dbperformance.log

I think that's expected. As I read it, the fix above is meant to work by having the hook do the parsing, and then have core skip doing the parsing.

Testing Locally with TemplateData, this seems to work, with parsing only happening in TemplateData extension on page save.

<aside>It'd be nice if there was some way to do ?forceprofile=true/?forcetrace=true on page save. Maybe $wgDebugRedirects could somehow be triggered by a url parameter, and whatever magic makes forcetrace work in places where forceprofile doesn't be made to happen </aside>
Comment 24 Gerrit Notification Bot 2013-12-13 17:18:38 UTC
Change 101224 had a related patch set uploaded by Anomie:
Fix WikiPage::prepareContentForEdit's default format handling

https://gerrit.wikimedia.org/r/101224
Comment 25 Brad Jorsch 2013-12-13 17:33:52 UTC
(In reply to comment #21)
> I think there's still double rendering going on. On commons, a null edit
> takes
> just slightly more than double the time a page preview takes. On my local
> copy,
> the operations take roughly the same time (As they should).

I redid the same sort of simulation as I did in comment 10 (anyone with access to terbium, feel free to look at /home/anomie/eval-bug57026.txt), and it seems there is no longer double parsing going on before the text is saved to the database on enwiki. I also checked Commons with simulating a null edit on a random page, and again I see no double parsing.

Doing a non-null edit on Commons, I see abuse filter rule 87's use of added_links is triggering reparses (it seems to be broken such that it runs added_links for all actions). While it could probably use at least the same as in Gerrit change #95481 (and probably some checking to see if it's going to be breaking the cache by passing the old wikitext for some vars), I submitted Gerrit change #101224 to be better about handling a null format in core.

Also, there might still be a second parse from WikiPage::doEditUpdates() if the page uses magic words that access information about the current revision or if the page transcludes itself. I don't see any way around that.

And I'm not entirely sure that just timing a null edit isn't going to cause a parser cache miss for the view-after-save, particularly if your preferences don't match the defaults. Do you still see double timing if you do the null edit via the API?
Comment 26 Gerrit Notification Bot 2013-12-13 20:19:37 UTC
Change 101224 merged by jenkins-bot:
Fix WikiPage::prepareContentForEdit's default format handling

https://gerrit.wikimedia.org/r/101224
Comment 27 Bawolff (Brian Wolff) 2013-12-14 00:33:45 UTC
(In reply to comment #25)
> 
> Also, there might still be a second parse from WikiPage::doEditUpdates() if
> the
> page uses magic words that access information about the current revision or
> if
> the page transcludes itself. I don't see any way around that.
> 
> And I'm not entirely sure that just timing a null edit isn't going to cause a
> parser cache miss for the view-after-save, particularly if your preferences
> don't match the defaults. Do you still see double timing if you do the null
> edit via the API?

I was looking at firebug, and recording the time it took to retrieve the POST request that results in a redirect. After the redirect, firefox does a GET request which usually parses the page again, which I didn't count.

I was testing https://commons.wikimedia.org/wiki/Commons:Featured_picture_candidates/Log/November_2013 and https://commons.wikimedia.org/wiki/Commons:Featured_picture_candidates/Log/October_2013 which a user at commons had previously identified as being slow.

I don't believe these pages are vary-revision, but hard to be 100% sure of that

I was testing via index.php. When I test via api.php, page save actions take the same time as rendering a page preview, which is interesting. Thus the issue I'm seeing seems to be from the web interface only.
Comment 28 Gerrit Notification Bot 2013-12-14 15:24:36 UTC
Change 101490 had a related patch set uploaded by Anomie:
Use WikiPage::prepareContentForEdit in SpamBlacklistHooks::filterMergedContent

https://gerrit.wikimedia.org/r/101490
Comment 29 Brad Jorsch 2013-12-14 15:24:52 UTC
(In reply to comment #27)
> I was testing via index.php. When I test via api.php, page save actions take
> the same time as rendering a page preview, which is interesting. Thus the
> issue
> I'm seeing seems to be from the web interface only.

I did some investigating of the UI edit code page, and found the culprit there is SpamBlacklistHooks::filterMergedContent.
Comment 30 Gerrit Notification Bot 2013-12-26 18:15:05 UTC
Change 101490 merged by jenkins-bot:
Use WikiPage::prepareContentForEdit in SpamBlacklistHooks::filterMergedContent

https://gerrit.wikimedia.org/r/101490
Comment 31 Brad Jorsch 2013-12-26 18:33:26 UTC
We've cleaned up more instances of double-parsing, including one in the UI code path. Latest changes should go out with 1.23wmf9.

Before reopening this bug, please test that the timeouts are due to page save taking much longer than an equivalent preview (or api action=parse), and that the page being saved does not use revision-specific magic words or parser functions and that the page being saved does not transclude itself.
Comment 32 Brad Jorsch 2014-01-08 15:30:18 UTC
Wikidata is calling Edit hooks with non-editable pages as context, which breaks Gerrit change #101490. Gerrit change #106126 can be merged to reinstate it once bug 59788 is fixed.
Comment 33 Daniel Kinzler 2014-01-09 13:36:32 UTC
The problem arises because Wikibase generates titles for new pages from a counter stored in the database, which is incremented in the transaction that actually creates the page. So, before the page is saved, the title isn't know. When calling a pre-save edit hook in such a situation, there is no way to provide the actual title of the page that will be created.

Perhaps it would be possible to garb a fresh ID and determine the Title earlier - but that means wasting an ID for every failed attempt to save. It also means refactoring quite a bit of code in Wikibase, on a critical code path. 

I think it's really a fundamental question whether it can be assumed that the actual title of a page can be assumed to be known before a page has been created. A related fundamental question is whether the context title of an edit is always the title of the page where the new content will be saved, or if there are situations where this can not sensibly be expected to be the case.
Comment 34 Cyberpower678 2014-01-09 13:55:08 UTC
Date/Time: Thu, 09 Jan 2014 12:27:52 +0000
Method: POST
URL: http://cy.wikipedia.org/w/api.php (Parameters masked for security)
Raw Data: 
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Wikimedia Error</title>
<meta name="author" content="Mark Ryan"/>
<meta name="copyright" content="(c) 2005-2007 Mark Ryan and others. Text licensed under the GNU Free Documentation License. http://www.gnu.org/licenses/fdl.txt"/>

<style type="text/css"><!--
body {
background-color: #dbe5df;
font: 100% "Gill Sans MT", "Gill Sans", "Trebuchet MS", Helvetica, sans-serif;
margin: 0px;
}
.TechnicalStuff {
font-size: 0.8em;
font-style: italic;
text-align: center;
padding-bottom: 0.8em;
}
.Lines {
width: 100%;
height: 1px;
overflow: hidden;
font-size: 0.5px;
}
.ContentArea {
background-color: white;
padding: 0.8em 10% 0 10%;
font-size: 1.0em;
}
a:hover {
color: red;
}
h1, h2 {
margin: 0px;
font-size: 1.0em;
}
h2 {
background: #9fbfd8;
font-size: 1.2em;
font-weight: bold;
text-align: center;
}
h1 {
background: #dbe5df;
font: bold 1.5em "Gill Sans MT", "Gill Sans", Helvetica, Humanist, sans-serif;
text-transform: uppercase;
text-align: center;
width: 100%;
padding-top:0.8em;
}
-->
</style>
<script>/*<![CDATA[*/
	function lines(s) {
		var c = s.split(' ');
		for (var i = 0; i < c.length; i++) {
			document.write('<div class="Lines" style="background-color:#' + c[i] + ';"></div>');
		}
	}
//]]></script>
</head>

<body link="#24442E" text="#000000" vlink="#24442E" alink="#FF0000">
<h1>Wikimedia Foundation</h1>
<script>lines('ccd4cf bdc3bf adb1af 9ea09f dbe5df');</script>

<h2>Error</h2>

<script>lines('8f8f8f acacac c6c6c6 dbdbdb eaeaea f4f4f4');</script>

<!-- BEGIN CONTENT AREA -->
<div class="ContentArea">

<div id="en" lang="en">
<p>Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please <a href="//w/api.php" onclick="window.location.reload(false); return false">try again</a> in a few minutes.</p>
<hr noshade="noshade" size="1px" width="80%" />
<div class="TechnicalStuff">
If you report this error to the Wikimedia System Administrators, please include the details below.<br/>
</div>
<div class="TechnicalStuff">
<bdo dir="ltr">
Request: POST http://cy.wikipedia.org/w/api.php, from 10.64.0.102 via cp1066 cp1066 ([10.64.0.103]:3128), Varnish XID 3544388602<br/>Forwarded for: 10.4.1.153, 10.64.0.102<br/>Error: 503, Service Unavailable at Thu, 09 Jan 2014 12:28:08 GMT
</bdo>
</div>
</div>

</div>

<script>lines('9ea09f adb1af bdc3bf ccd4cf');</script>

</body>
</html>

UNSERIALIZATION FAILED
Comment 35 Brad Jorsch 2014-01-09 16:13:33 UTC
(In reply to comment #34)
> Date/Time: Thu, 09 Jan 2014 12:27:52 +0000
> Method: POST
> URL: http://cy.wikipedia.org/w/api.php (Parameters masked for security)

That makes it difficult to debug this. But I don't see any long times in the API log around that time (the worst time so far today for cywiki was only 14186ms, far below the point where this bug shows up). I also don't see any errors or exceptions in exception.log or fatal.log around this time. Which means it's probably unrelated to this specific bug (errors caused by long parse times).
Comment 36 Gerrit Notification Bot 2014-01-13 23:47:38 UTC
Change 107266 had a related patch set uploaded by MarkAHershberger:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/107266
Comment 37 Gerrit Notification Bot 2014-01-13 23:51:35 UTC
Change 107266 merged by jenkins-bot:
Avoid extra parsing in prepareContentForEdit()

https://gerrit.wikimedia.org/r/107266
Comment 38 Nemo 2014-02-25 16:40:47 UTC
One of the patches was backported to core stable (causing bug 60054), does this mean that there was some urgency for core too and this is not a Wikimedia-only bug? Please categorise it as appropriate.
What else needs to be done here? Cyberpower678, are you still seeing errors?
Comment 39 Matthew Flaschen 2014-02-26 00:13:49 UTC
Main patches seem to be against core, so moving accordingly.
Comment 40 Andre Klapper 2014-07-03 12:42:22 UTC
All six patches mentioned in this bug report have been merged. 

What is left here?
Comment 41 Brad Jorsch 2014-07-03 14:06:34 UTC
(In reply to Andre Klapper from comment #40)
> All six patches mentioned in this bug report have been merged. 
> 
> What is left here?

One of the patches was reverted due to issues in Wikidata, see bug 59788.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links