Last modified: 2013-07-31 16:24:31 UTC
Our current round-trip test server is bogging down with the large DB it has accumulated by now. This makes it the bottleneck in rt testing, leaving the clients mostly idle. Much of the db size is XML-encoded old test results, which we don't really need any more. Moving those large results to a separate db might make it easier to truncate old results by simply re-creating the result xml db. Apart from the DB size, the node.js sqlite bindings we use don't seem to help performance either. IIRC they don't support transactions and other performance-improving features / pragmas. We mainly used sqlite because it was easy to get started, but it might make sense to re-evaluate that choice. A separate DB server would at least make it possible to use two cores instead of just one.
WIP patch: https://gerrit.wikimedia.org/r/#/c/69156/
Another issue with the current code is that marking repeated crashing titles as failed does not work reliably. There is a counter in the claims table that should mark the title as an error on reaching some number of retries.
(In reply to comment #2) > Another issue with the current code is that marking repeated crashing titles > as > failed does not work reliably. There is a counter in the claims table that > should mark the title as an error on reaching some number of retries. I've proposed a patch that should fix this: https://gerrit.wikimedia.org/r/#/c/73985/
Change 75895 had a related patch set uploaded by Marcoil: Refactor the database schema for performance. To use only the pages table to determine the next title to be processed, store the latest claim (hash, timestamp and number of tries) and latest score for each page. Then, to get the next title query for the o https://gerrit.wikimedia.org/r/75895
Change 75895 merged by jenkins-bot: Refactor the database schema for performance. https://gerrit.wikimedia.org/r/75895
Fix is in rt_testing branch, reopen if more work is necessary before merging into master.