Last modified: 2014-09-03 18:32:29 UTC
I'm seeing these in the log: 2014-07-07 14:05:42 mw1015 commonswiki: Update for doc ids: 15731483; error message was: No enabled connection I don't imagine we're actually out of connections - probably just hitting some other http error and eating it. We should not eat it.
This should only happen after we've tried getMaxConnectionAttempts() times. We never call getConnection() directly, so it should be handled by our callback. Getting a full stacktrace out of this would help.
Yeah. I've seen this in a maintenance script when I tried to bulk insert too much data. I _think_ it has something to do with commands being too too too big getting interpreted as a retry-able http error. We actually have logic to resubmit the command as singletons but I think we don't get to use it because we've marked all the connections as busted due to the error. Might try to reproduce with stupid huge page.
Change 156803 had a related patch set uploaded by Manybubbles: Increase timeout on updates https://gerrit.wikimedia.org/r/156803
https://gerrit.wikimedia.org/r/#/c/156798/ and https://gerrit.wikimedia.org/r/#/c/156800/ as well. Its not stupid huge pages. Its massive influxes of updates all at once, I believe. Anyway, these commits should make it more stable as well as give us more information when it fails.
Change 156803 merged by jenkins-bot: Increase timeout on updates https://gerrit.wikimedia.org/r/156803