Last modified: 2014-10-02 01:56:25 UTC
DELETION queries which are joining with replication databases for selection are really slow on mariadb 10. example test Query: CREATE TEMPORARY TABLE IF NOT EXISTS CCats (pid INT, cat VARCHAR(255), PRIMARY KEY (pid, cat), INDEX(cat)) ENGINE=Aria PAGE_CHECKSUM=0 TRANSACTIONAL=0; INSERT IGNORE INTO CCats (pid, cat) SELECT ip.page_id, cl_to FROM commonswiki_p.page fp INNER JOIN commonswiki_p.templatelinks ON fp.page_id=tl_from INNER JOIN commonswiki_p.image ON TRIM(LEADING 'Featured_picture_candidates/File:' FROM tl_title) = img_name INNER JOIN commonswiki_p.page ip ON img_name = ip.page_title AND ip.page_namespace=6 INNER JOIN commonswiki_p.categorylinks ON ip.page_id=cl_from WHERE fp.page_namespace=4 AND fp.page_title = 'Featured_picture_candidates' AND tl_namespace=4 AND tl_title LIKE 'Featured_picture_candidates/File:%'; DELETE cc FROM CCats cc INNER JOIN commonswiki_p.page ON page_namespace=14 AND page_title = cat INNER JOIN commonswiki_p.categorylinks ON cl_from = page_id WHERE cl_to='Hidden_categories' AND page_namespace=14; Execution time of this small query is about 30 seconds on labs, 0.32 seconds on old toolserver. I already talked with sean about this problem on irc. There might be a patch available on september. A more general tokudb bug is reported at https://github.com/Tokutek/tokudb-engine/issues/75 . Different to tokudb bug in my case toku db is only used for selection and no deletion is performed on these tables.
I'm fairly sure this is a manifestation of the same bug in the github report, which relates to TokuDB fractal indexes not (yet) using a bulk prefetching optimization by design in a number of situations, including DELETE. To prove this, convert the example DELETE to an equivalent SELECT and see the runtime drop to <1s. That indicates to a possible work around using an additional temp table and a subquery: CREATE TEMPORARY TABLE CCats2 AS SELECT * FROM CCats; DELETE cc FROM CCats cc JOIN ( SELECT cc2.cat FROM CCats2 cc2 JOIN commonswiki_p.page ON page_title = cc2.cat JOIN commonswiki_p.categorylinks ON cl_from = page_id WHERE cl_to = 'Hidden_categories' AND page_namespace = 14 ) t ON cc.cat = t.cat; Time allowing, I'll test TokuDB 7.1.8 before the panned September GA.
Hey Sean, what's the status on this? For Merl's migration from Toolserver this is really important. Without working delete queries, he'd have an enormous extra work to modify queries. Hoping for good news!? Best Silke
MariaDB 10.0.14 was released late last week, and change log says the TokuDB fix for slow DELETE is included. Doing a test build today...
MariaDB 10.0.14 has TokuDB 7.5.0 [1] which now uses a bulk-fetch approach for delete [2][3]. The labs DBs have been upgraded. Please test and report back? [1] https://github.com/Tokutek/tokudb-engine/wiki/Release-Notes-for-TokuDB-7.5.0 [2] https://tokutek.atlassian.net/browse/DB-311 [3] https://tokutek.atlassian.net/browse/DB-500
Ok, cool, Sean! Thanks! Merl, have you tried this, yet?
Today i tried to run a script (missing articles) which has a lot of those deletion queries with many rows affected. It was not possible the run this script on labs before. After your update it is now possible to run these queries. So thx to Sean. The last hours i spend a lot in rewriting these queries because they still were really slow (although possible). The problem was that there is no index on page_props.pp_value. Now i solved this problem. It is really interesting that using the workaround i implemented today runs the query on labs about as fast as the old script version on toolserver, but running the new version on toolserver is three times slower than using the unindexed page_props.pp_value directly. Old version running on labs is about ten times slower.
So we have index differences too? Is the toolserver schema dumped anywhere I can see it?