Last modified: 2014-10-07 21:04:16 UTC
As seen at https://tendril.wikimedia.org/report/, we have a bunch of crawlers of various types hitting non-existent pages. We do a move/delete log query on such page views...which is fine except when lots of queries come in at once. They end up taking 16s to 18s. Possible solution is to avoid calling the LogEventList method in showMissingArticle based on a Bloom Filter in Redis. This would be updated on the fly. Not sure how to estimate the set size to keep the false hit rate down.
(In reply to Aaron Schulz from comment #0) > As seen at https://tendril.wikimedia.org/report/, we have a bunch of > crawlers of various types hitting non-existent pages. We do a move/delete > log query on such page views...which is fine except when lots of queries > come in at once. They end up taking 16s to 18s. > > Possible solution is to avoid calling the LogEventList method in > showMissingArticle based on a Bloom Filter in Redis. This would be updated > on the fly. Not sure how to estimate the set size to keep the false hit rate > down. Of course a bloom filter requires scanning all of `logging` and using add() for new deletes. This is problematic if the redis server is not durable or is downed (since repopulation cannot be on the fly). Maybe the rebuilding could be automatic and batched (switching the filter on when done).
Also it might help to route non-user based logging queries to all DBs rather than just db1055 (the partitioning of that table by user is necessary for this query).
Change 143802 merged by jenkins-bot: Added BloomCache classes https://gerrit.wikimedia.org/r/143802
Deployed and populated (on enwiki, mostly automatically).