Last modified: 2014-08-19 18:18:19 UTC
There MUST be a session not being closed somewhere, as while doing a lot of testing on wikimetrics, I encountered: OperationalError: (OperationalError) (1203, "User u2543 already has more than 'max_user_connections' active connections") None None This could happen if many parallel queries were open at the same time, but I don't think the setup I was using would cause that scenario. So, we've tried to fix this a few times in the past, I think it's time we do it properly with flask-sqlalchemy or something else that is rock solid. Basically after a refactor, we should never see manual calls to open sessions, they should all be using something like: with session_thing: blah
looking through the labsdb databases, here's a list of the wikis to which connections were zombied while running RollingActiveEditor: svwiki ruwiki lgwiki nostalgiawiki lijwiki vowiktionary swwiki glwikibooks htwiki rnwiki nawikibooks
It is funny that this list is not much bigger if we were running queries for all projects (if that is not the case please disregard that comment).. could it be that open sessions are resulting due to timeouts when connecting to the db?
Note that flask-sqlalchemy will help us with sessions instantiated by a web request. But not with sessions instantiated by the queue which does not interact with flask. I think that tasks that access the db that throw out an exception while executing will in some cases leave a 'shadow' session. Most likely we need to wrap the task execution method for all tasks that celery executes and do the session handling there to make sure that 1) there is only one session used per task 2)we catch all exceptions and close that session. We will define the session scope and the task scope as being the same, just like flask-sqlalchemy equals the session scope and request scope. This is a refactor that should be not too hard.
Actually celery has worker signals: http://celery.readthedocs.org/en/latest/userguide/signals.html#worker-signals I think we can tie to those to initiate and destroy sessions as those signals are tied to worker lifecycle.
Collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-68833
Actually, not worker signals but rather task-signals: http://celery.readthedocs.org/en/latest/userguide/signals.html#task-signals I will have some tests ready before tasking tomorrow
see sample patch: https://gerrit.wikimedia.org/r/#/c/152888/
Change 153616 had a related patch set uploaded by Milimetric: Ensure wikimetrics session is always closed https://gerrit.wikimedia.org/r/153616
Change 154851 had a related patch set uploaded by Milimetric: Disable pooling for mediawiki dbs https://gerrit.wikimedia.org/r/154851
Change 153616 merged by jenkins-bot: Ensure database sessions are always cleaned up https://gerrit.wikimedia.org/r/153616
Change 154851 merged by Nuria: Disable pooling for mediawiki dbs https://gerrit.wikimedia.org/r/154851