Last modified: 2014-06-24 05:03:37 UTC
When trying to create a page that has been previously deleted, a message is shown to the user to remind that it was deleted before. That information is retrieved from the logging table, by matching the title. Currently, there is no index for the log_title field in that table, hence the query is slow.
What's actually needed over here is (log_namespace, log_title, log_type), see query at the bottom of https://gerrit.wikimedia.org/r/#/c/139103/3/AbuseFilterVariableHolder.php Not sure we really want that for only this feature, but Huji mentioned other code paths also running a similar query.
To be politically correct: In EditPage.php at the very end of showIntro() function you will find a call to LogEventsList::showLogExtract() which is how the message on top of the page is created). LogEventsList::showLogExtract() itself is defined in includes/logging/LogEventsList.php and uses the getBody() function of LogPager to get a list of 50 recent delete log entries for that page and in the end, it is the doQuery() function of Pager class which actually runs the query. If you follow this path you will notice that the query will be run on log_namespace, log_title and log_type fields.
The example queries I've seen so far always use both log_namespace and log_title which means the page_time index can be used partially: KEY page_time (log_namespace, log_title, log_timestamp) Using enwiki pages with 100000+ log entries the queries take ~1s on warm data. Not terrible, though the number of Handler% calls is directly proportional to the number of log entries which isn't great for long term scalability. So tentative +1 to this bug. I've been trialing the following index on enwiki slaves for a few months: KEY log_title_type_time (log_title(16), log_type, log_timestamp) It sees quite a bit of use in general, and is used in favour of page_time for counting page deletes in the larger namespaces like 10 and 828. We should investigate how it compares to (log_namespace, log_title, log_type); the latter is probably better but might service fewer queries overall as well as encroach on page_time.