Last modified: 2008-05-13 13:39:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14102 - API query for template usage involving multiple titles and namespace filter times out
API query for template usage involving multiple titles and namespace filter t...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Roan Kattouw
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-13 02:42 UTC by carnildo
Modified: 2008-05-13 13:39 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description carnildo 2008-05-13 02:42:05 UTC
An API query on multiple titles for template usage with a namespace restriction gets an HTTP timeout.

Steps to reproduce:

Visit the URL http://en.wikipedia.org/w/api.php?action=query&format=xml&prop=templates&tlnamespace=10&titles=User:Carnildo/sandbox3|User:Carnildo/sandbox6

Removing either the "tlnamespace" term or either of the titles eliminates the error.

Expected result:

<?xml version="1.0" encoding="utf-8"?><api><query><pages><page pageid="9945938" ns="2" title="User:Carnildo/sandbox3" /><page pageid="16027416" ns="2" title="User:Carnildo/sandbox6"><templates><tl ns="10" title="Template:Administrators&#039; noticeboard navbox" /><tl ns="10" title="Template:Archiveline" /><tl ns="10" title="Template:Editabuselinks" /><tl ns="10" title="Template:Nowrap" /><tl ns="10" title="Template:Nowrap begin" /><tl ns="10" title="Template:Nowrap end" /><tl ns="10" title="Template:Purge" /><tl ns="10" title="Template:Shortcut" /><tl ns="10" title="Template:Tnavbar" /><tl ns="10" title="Template:•w" /><tl ns="10" title="Template:•wrap" /></templates></page></pages></query></api>

Actual result:

504 Gateway Timeout
Comment 1 Aaron Schulz 2008-05-13 04:00:02 UTC
Looks like this should be fast, not sure why it is slow.
Comment 2 Aaron Schulz 2008-05-13 06:10:41 UTC
Maybe this is another insane index choice, it may be scanning the (namespace, title, from) index and filesorting...
Comment 3 Aaron Schulz 2008-05-13 06:18:57 UTC
Not having this doing the query on TS
Comment 4 Roan Kattouw 2008-05-13 10:41:20 UTC
It's using the correct index, it just doesn't like it when you do:

SELECT stuff FROM templatelinks WHERE tl_title IN ('98', '107) AND tl_namespace=0 ORDER BY tl_from, tl_namespace, tl_title

It doesn't like the fact that you're sorting by tl_namespace while it's a constant in the WHERE clause. Dropping tl_namespace from the ORDER BY (it's pointless anyway) fixes that. The same happens for tl_title when there's only one title.

Fixed in r34720
Comment 5 Aaron Schulz 2008-05-13 13:13:36 UTC
(In reply to comment #4)
> It's using the correct index, it just doesn't like it when you do:
> 
> SELECT stuff FROM templatelinks WHERE tl_title IN ('98', '107) AND
> tl_namespace=0 ORDER BY tl_from, tl_namespace, tl_title
> 
> It doesn't like the fact that you're sorting by tl_namespace while it's a
> constant in the WHERE clause. Dropping tl_namespace from the ORDER BY (it's
> pointless anyway) fixes that. The same happens for tl_title when there's only
> one title.
> 
> Fixed in r34720
> 

If you say X = 'a' and sort by X,Y that shouldn't go that slow, unless there is a bug. Also, I did the two part query on TS myself, and it was fast (only like 9 results). If it was using the right index, it should be fast. The query took like an hour according to profiling, which strongly suggests wrong index usage. I hope the ORDER BY tweak avoids that perhaps.
Comment 6 Aaron Schulz 2008-05-13 13:15:00 UTC
"If you say X = 'a' and sort by X,Y that shouldn't go that slow". Of course, I meant to also say "in comparison to sorting by Y only".
Comment 7 Roan Kattouw 2008-05-13 13:30:44 UTC
(In reply to comment #5)
> If you say X = 'a' and sort by X,Y that shouldn't go that slow, unless there is
> a bug. Also, I did the two part query on TS myself, and it was fast (only like
> 9 results). If it was using the right index, it should be fast. The query took
> like an hour according to profiling, which strongly suggests wrong index usage.
> I hope the ORDER BY tweak avoids that perhaps.
> 

I know that. Anyway, MySQL behaves very stupidly and insists on filesorting when it encounters pl_namespace twice. According to my EXPLAINs, WHERE pl_namespace=0 ORDER BY pl_from, pl_namespace, pl_title filesorts, but omitting pl_namespace from the ORDER BY doesn't. The right index (pl_from, pl_namespace, pl_title) is automatically picked. If this query still has issues after the server sync, we probably need to force the index.
Comment 8 Aaron Schulz 2008-05-13 13:37:37 UTC
Even if all it was doing was using the pl_from part of the right index, doing a WHERE and a filesort, it should still be fast since it was only for those two pages, and they didn't seem to have any huge link lists. I'm still inclined that it was doing the wrong index in this case, as nothing else seems to explain the slowness.
Comment 9 Roan Kattouw 2008-05-13 13:39:44 UTC
I'll add a FORCE INDEX just to be sure.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links