Last modified: 2014-08-08 12:16:36 UTC
I'm one of the maintainers of the Lists tool: http://tools.wmflabs.org/lists/ This tool executes a series of queries every day: after each query it runs another query to record some statistic information. If the primary query runs for more than 600 seconds, the secondary one fails with the error "General error: 2006 MySQL server has gone away". This issue has begun after the migration to MariaDB 10.
Duplicate of bug 68753?
I've found that the problem is related with the primary queries: some of them now are slower then before and exceeds the 600 seconds limit. For example the query [1] runned in 170 seconds and now runs in more than 1000 seconds. [1] http://tools.wmflabs.org/lists/itwiki/Voci/Voci_senza_uscita
Please post examples of both queries.
The query is at the bottom of the previous link. The query that fails is not important, the problem is that this query takes too long to run.
I realize you think the speed is the problem, which I agree is an issue. However there is no "over 600s" type kill mechanism, so I'm interested in establishing two things: 1. Why the first query is slow. Thanks, I see the example now. 2. Why the second query dies and whether it is, in fact, related to the speed of the first query, or to something else unexpected. Hence I asked to see it too...
Incola: "not important" doesn't really exist when trying to find steps to reproduce. :)
The second query is something like: insert into `executions` (`query_id`, `time`, `duration`, `results`) values (`23`, `2014-08-05 14:01:05`, `1879`, `5290`)
The first query is heavily dependent on disk IO. It runs in ~1000s on both MariaDB 10 and 5.5 if data is cold, or if any other concurrent query is also bottle necked on disk. This should be reviewed once the switch back to SSD is done (to be scheduled very shortly after labsdb1003 migrates). Regarding the second query dying or losing connection, which still seems odd, it would be useful to know: - If the first query always completes regardless of slow runtime, or sometimes fails/is-killed itself. - If there is any delay between issuing the two queries on the same DB connection (seconds, minutes, etc ..). - If there is any transaction in use, either via explicit BEGIN or AUTO_COMMIT=0. - What client connector or library is used, and whether it could have any custom timeout settings.
- The first query always runs correctly. - They are on different connections. - I don't know because I'm not the original author of the code and I don't know how works the framework that was used. - The first query runs via shell command invocated by a PHP script, the second one via the PHP script directly. The script is this one: https://git.wikimedia.org/blob/labs%2Ftools%2Flists/d291a438ef6e1aa0e4630d501cd9a28bedb014cc/app%2Fcommands%2FExecCrontab.php
After switching back to SSD no error is reported and the queries are run with their previous timing.