Last modified: 2014-06-24 05:03:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68961, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66961 - Logging needs a log_title index
Logging needs a log_title index
Status: NEW
Product: MediaWiki
Classification: Unclassified
Logging (Other open bugs)
1.24rc
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on:
Blocks: 20892
  Show dependency treegraph
 
Reported: 2014-06-23 01:57 UTC by Huji
Modified: 2014-06-24 05:03 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Huji 2014-06-23 01:57:22 UTC
When trying to create a page that has been previously deleted, a message is shown to the user to remind that it was deleted before. That information is retrieved from the logging table, by matching the title. Currently, there is no index for the log_title field in that table, hence the query is slow.
Comment 1 Marius Hoch 2014-06-23 02:56:57 UTC
What's actually needed over here is (log_namespace, log_title, log_type), see query at the bottom of https://gerrit.wikimedia.org/r/#/c/139103/3/AbuseFilterVariableHolder.php

Not sure we really want that for only this feature, but Huji mentioned other code paths also running a similar query.
Comment 2 Huji 2014-06-23 12:55:42 UTC
To be politically correct:

In EditPage.php at the very end of showIntro() function you will find a call to LogEventsList::showLogExtract() which is how the message on top of the page is created). LogEventsList::showLogExtract() itself is defined in includes/logging/LogEventsList.php and uses the getBody() function of LogPager to get a list of 50 recent delete log entries for that page and in the end, it is the doQuery() function of Pager class which actually runs the query. If you follow this path you will notice that the query will be run on log_namespace, log_title and log_type fields.
Comment 3 Sean Pringle 2014-06-24 05:03:37 UTC
The example queries I've seen so far always use both log_namespace and log_title which means the page_time index can be used partially:

KEY page_time (log_namespace, log_title, log_timestamp)

Using enwiki pages with 100000+ log entries the queries take ~1s on warm data. Not terrible, though the number of Handler% calls is directly proportional to the number of log entries which isn't great for long term scalability.

So tentative +1 to this bug.

I've been trialing the following index on enwiki slaves for a few months:

KEY log_title_type_time (log_title(16), log_type, log_timestamp)

It sees quite a bit of use in general, and is used in favour of page_time for counting page deletes in the larger namespaces like 10 and 828. We should investigate how it compares to (log_namespace, log_title, log_type); the latter is probably better but might service fewer queries overall as well as encroach on page_time.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links