Last modified: 2014-10-25 00:57:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33197, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31197 - Implement revision filter by namespace for big wikis in "miser mode"
Implement revision filter by namespace for big wikis in "miser mode"
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: High normal with 12 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 29876
  Show dependency treegraph
 
Reported: 2011-09-27 20:19 UTC by Romaine
Modified: 2014-10-25 00:57 UTC (History)
26 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Romaine 2011-09-27 20:19:14 UTC
Since MediaWiki 1.18 it is no longer possible to select the appropiate namespace in what a user has made edits. We certainly would like that back! 

Greetings - Romaine
Comment 1 Roan Kattouw 2011-09-27 20:57:52 UTC
This was done deliberately in r88025 by freakolowsky, and purports to have been requested by Domas.
Comment 2 Jack Phoenix 2011-09-27 21:05:28 UTC
+9001

The related commit in question is r88025, committed by freakolowsky on 14 May 2011: "hidden namespace select box if in wgMiserMode(requested by domas)"

How on Earth did the servers survive all these years with this feature being enabled -- gasp! -- even on the English Wikipedia? Apparently well enough. I used to use the namespace selection box on Special:Contributions very often and not having it hinders the usability of the whole special page very much, kinda like what r48735 did to Special:RecentChanges -- it was eventually reverted in r56334.
Comment 3 Roan Kattouw 2011-09-28 19:45:59 UTC
(In reply to comment #1)
> This was done deliberately in r88025 by freakolowsky, and purports to have been
> requested by Domas.

(In reply to comment #2)
> How on Earth did the servers survive all these years with this feature being
> enabled -- gasp! -- even on the English Wikipedia? Apparently well enough.
CC Domas so he can comment on this himself.
Comment 4 Nemo 2011-10-02 21:58:12 UTC
The special page gets really hard to use without this feature.
The rationale is very poor. 
In case we really need to save DB resources, is this less useful and more expensive than "Only show edits that are latest revisions", for instance?
Also, it doesn't make much sense to me to use $wgMiserMode for this: if the documentation is correct, MiserMode just delays functioning of some query special pages, making them less up to date but not totally unusable. Another configuration variable should be used.
Comment 5 Roan Kattouw 2011-10-02 22:19:18 UTC
(In reply to comment #4)
> Also, it doesn't make much sense to me to use $wgMiserMode for this: if the
> documentation is correct, MiserMode just delays functioning of some query
> special pages, making them less up to date but not totally unusable. Another
> configuration variable should be used.
The documentation is wrong, then. $wgMiserMode has been used as a generic "don't do expensive DB queries" setting for a long time.
Comment 6 p858snake 2011-10-02 22:34:32 UTC
(MAC)
(In reply to comment #4)
> Also, it doesn't make much sense to me to use $wgMiserMode for this: if the
> documentation is correct, MiserMode just delays functioning of some query
> special pages, making them less up to date but not totally unusable. Another
> configuration variable should be used.

Out documentation is slightly outdated for that, It's currently along the lines of   "Disable database heavy services, so that they can be managed/controlled separately if desired (for services that permit it (for example: special page caching)."
Comment 7 Mark A. Hershberger 2011-10-04 20:59:35 UTC
I've asked Domas to provide a reason for this change.  If we don't hear from him by the end of this week, then I'll suggest that the change be backed out.
Comment 8 Sam Reed (reedy) 2011-10-05 02:02:41 UTC
Someone should probably get an explain of the query against say enwiki for reference and log it in here
Comment 9 Domas Mituzas 2011-10-05 15:37:40 UTC
EXPLAIN won't provide enough justice.

The problem is that currently "show me 50 edits from namespace X" can read 50 database rows, or it can read all of the user contributions and return 0. 

It is not possible to index this without denormalizing the dataset (page_namespace has to sit together with all revisions). 

e.g. 10 edits for Rambot reads:


mysql> show status like '%handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     | 
| Handler_read_key      | 3     | 
| Handler_read_next     | 9     | 


now, 10 edits for Rambot, verifying page namespace (e.g. 10):

mysql>  select * from revision join page on page_id=rev_page where rev_user_text='Rambot' and page_namespace=10 limit 10;
Empty set (1 min 53.16 sec)

mysql> show status like '%handler_read%';
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      | 
| Handler_read_key      | 138423 | 
| Handler_read_next     | 138448 | 

Fixing this allows additional revision index, and denormalization which prohibits from cross namespace renames. 

Or we can allow multiple-minute queries. As a rule, users with large histories get way more scripted contributions checks :-)
Comment 10 Risker 2011-10-05 16:06:58 UTC
(In reply to comment #9)
I won't pretend to completely understand what has been illustrated here, but it would help to put it into simpler words. Does this mean that this particular feature draws on excess resources that has a deleterious effect on some other aspect of the project, or does it mean it's inelegant?  

This feature is very heavily used by editors and administrators on a daily basis (I cannot remember an editing day in the last 3 years when I have not used it), and thus it's not just a "nice" feature but one on which many of us rely.
Comment 12 Roan Kattouw 2011-10-05 16:33:02 UTC
(In reply to comment #10)
> (In reply to comment #9)
> I won't pretend to completely understand what has been illustrated here, but it
> would help to put it into simpler words. Does this mean that this particular
> feature draws on excess resources that has a deleterious effect on some other
> aspect of the project, or does it mean it's inelegant?  
> 
> This feature is very heavily used by editors and administrators on a daily
> basis (I cannot remember an editing day in the last 3 years when I have not
> used it), and thus it's not just a "nice" feature but one on which many of us
> rely.
It means that depending on how many edits the targeted user has, and how many of those were in the requested namespace, the query may take anywhere between 1 millisecond and 2 minutes. This is because the database can efficiently retrieve all edits by a certain user, but then has to go through them one by one to check the namespace against the requested namespace until it has either reached the number of qualifying edits requested (usually 50) or reached the end of the list of edits. The latter may take a long time for users with many edits. In the example Domas gave, he asked for 10 edits by Rambot in the Template namespace, and it took almost 2 minutes to examine every single edit by Rambot in the entire history of Wikipedia and conclude there were zero edits matching his criteria. And at our scale, queries taking more than a second are already considered slow.
Comment 13 Domas Mituzas 2011-10-05 16:49:35 UTC
Do note, my example was something I came up in few seconds, I could easily have found a way more interesting edge case ;-) 

@"mybugs.mail" - heh, that assumes that the page returns any, but if people query, they want to query something that is deeply buried.
Comment 14 Helder 2011-10-05 16:54:36 UTC
(In reply to comment #13)
> @"mybugs.mail" - heh, that assumes that the page returns any, but if people
> query, they want to query something that is deeply buried.

Yep!

It would be good to provide some message for the user in that case.
Comment 15 Leinad 2011-10-05 18:19:03 UTC
If bots are the main problem, could you exclude them from such queries? And maybe allow query only by autoconfirmed users?

Many people complaining that you removed this function: http://pl.wikipedia.org/wiki/Wikipedia:Kawiarenka/Kwestie_techniczne#MediaWiki_1.18_-_problemy - it was very useful.
Comment 16 bulwersator 2011-10-05 18:44:42 UTC
"And at our scale, queries taking more than a second are
already considered slow."

So solution is to fix this function
*look over last n edits
*kill query after 10 seconds
*cache results

It is very important function and removal "because it was slow" is a bad idea, it is similar to banning all interwiki bots due to large number of edits.
Comment 17 Foroa 2011-10-05 19:38:46 UTC
I don't think I buy this. Only a minority of the users use that feature; by default most users don't do any filtering. Mainly administrators use that filtering tool and till now, most of the time, response speed was acceptable. I use that many times a day but rarely in a distant past, mainly to check bot operations. So I fail to understand why this suddenly became a problem. If the system collapse under the load, I have no problem of having a smaller throttle or exponential back off inserted because anyway, we are generally verifying on a regular basis for the last hours and waiting a bit longer for the results results still in a time gain for us.
Comment 18 Nemo 2011-10-05 20:09:21 UTC
It's been explained why the feature is expensive, but not why it should be disabled, which is the point of the discussion.

There should be some criteria to disable features, otherwise they're disabled randomly. If there are server load problems, was this feature compared to other expensive features to evaluate cost vs. benefit and choose what to disable? Or does "queries taking more than a second are already considered slow" mean that all features which are able to generate queries longer than a second (or any other threshold) will be disabled?
If there are not server load problems, why should the feature be disabled?
Even if there are problems, can't they be resolved through other means and shouldn't their cost be compared to the benefit of the feature?

Last but not least, did someone prove that disabling the feature will actually reduce server load? Users need this (finding edits in specific namespaces), therefore if you don't allow this feature they'll just load the complete list of edits, 5000 at a time, and search the namespace within the pages.
In your example: Rambot, 140 000 edits, 28 pages; for the first I got "Served by mw30 in 14.588 secs", this makes a total of almost 7 minutes. That's not the length of the queries and I don't know if it's important, but perhaps it should be checked.
Comment 19 Philippe Beaudette 2011-10-05 23:25:03 UTC
Just to reiterate what's been said here... I've been approached by a couple of admins from the English Wikipedia who are dismayed that this feature is gone; it's an important tool in their arsenal.  Anything we can do to restore this funcationality would be greatly appreciated.  They've got a hard job and could use the help.

Thanks
Comment 20 Mark A. Hershberger 2011-10-05 23:48:33 UTC
raising priority since this complaint is heard repeatedly on enwiki.  This is a very visible issue.
Comment 21 Krinkle 2011-10-06 00:42:43 UTC
Note: The namespace filter for revisions (like in Special:Contributions) was not removed from the MediaWiki software. It was changed to be hidden/disabled for large projects that run in a so called "Miser mode", which prevents certain queries to the database that are too slow for such a large project.

Retitling bug to request implementation of namespace filter in such a way that it can even be run on "miser mode" wikis. Depending on the cost/benefit this may not be possible in the short term.
Comment 22 Brad Jorsch 2011-10-06 03:02:13 UTC
(In reply to comment #9)
> Or we can allow multiple-minute queries. As a rule, users with large histories
> get way more scripted contributions checks :-)

Is the problem the fact that the multiple-minute query uses too many resources, or is it just that you think people will not like it if the page takes so long to load? I suspect the former, but if it's the latter it seems clear that people dislike it more not having the feature available.
Comment 23 John Mark Vandenberg 2011-10-06 03:34:23 UTC
It is only edge cases which take a lot of time (like rambot), and the longer the query takes, the more difficult it would be to obtain the results via another approach.

The vast majority of invocations are quick (a few seconds).  If we are really worried about preventing these worse case queries, it should be disabled for users where
  subject user_editcount is 100,000 or more, and 
  invokers don't have a new permission like noquerylimit (similar to noratelimit)
    which would be given to administrators and maybe rollbackers

Any serious user wont be running these obscure queries unless they actually do want the results, and they are prepared to wait because the alternative is to step through 5000 edits at a time.
Comment 24 Erik Moeller 2011-10-06 03:54:25 UTC
Indeed, would setting a reasonable edit count limit on the target user help as a short term fix?
Comment 25 od_mishehu 2011-10-06 06:20:41 UTC
And if the problem is only what the user would like, then you could even set a default, and allow the user to change the number. However, most usage of this tool would be relatively quick, anyway - and it would help for so many purposes. For example:
1) Some users keep track of still-relevant discussions by ther contributions in specific namespaces (I do, for example, keep close track of my recent User talk: edits)
2) At times, various user editing patterns need to be looked at. Since these editting patterns are some times namespace based, we need to be able to keep track of it.

Note that it both of these cases, you don't usually need to look too far down to find 50 contributions in the relevant namespace.
Comment 26 bulwersator 2011-10-06 07:14:39 UTC
And maybe it is slow. Maybe this feature is significant load on wikimedia servers (unlikely). But alternative for user in the Rambot example is to load entire edting history (in chunks of 5000 edits) and find namespace keywords.

It is probably larger load on server (rendering multiple pages) and for user it requires more work - by multiple orders of magnitude.
Comment 27 bulwersator 2011-10-06 07:18:12 UTC
And hard limit of checking only n last contribution is not proper solution, I suggest to make this limit configurable and allow also searching over entire contributions. noquerylimit is also certain solution, wikis will be able to add it to all user groups.
Comment 28 Leinad 2011-10-06 09:14:34 UTC
(In reply to comment #20)
> raising priority since this complaint is heard repeatedly on enwiki.  This is a
> very visible issue.

Nice to hear that only enwiki is important for you :(

PS. I see here requests from representants of various communities...
Comment 29 Matthias Becker 2011-10-06 12:25:19 UTC
If a user has several hundred edits a day but one needs to filter f.ex. the few
talk page edits only that user made it won't make sens to deliver only the say
three talk page edits within that user's last 200 edits. The flitering is now
near worthless. I don't think it make sens to filter like this. It should be
restored to the former or it should shut down totally.

Maybe we should ressources at other points, f. ex. with features delivering cat
images on user pages and other useless features and *not* on features which are
needed, especially in bigger language versions. But for smaller communities
with only a few sysops who need to browse the recent changes vor several days
back the new behaviour is a hit below the belt. Administering and fighting
vandalism just became harder.

I think the rational was really poor. If there's an issue with the server it should be solved their and not by removing the feature.
Comment 30 Roan Kattouw 2011-10-06 12:39:54 UTC
(In reply to comment #29)
> I think the rational was really poor. If there's an issue with the server it
> should be solved their and not by removing the feature.
It's been explained time and again that this isn't something that can *possibly* be fixed server-side, the feature is *inherently* slow.
Comment 31 Domas Mituzas 2011-10-06 12:47:54 UTC
meh, I guess we can enable it, and expect edge cases to be rare. if they are not, well, we can point people at this discussion or go back to 2005 and deploy query snipers ;-)
Comment 32 Matthias Becker 2011-10-06 13:08:49 UTC
Domas, no user wants servers to crash (well some want, but that's another case) and if things can be optimized they should. But IMHO the solution cannot be f.ex. the toolserver as some guys in the German WP at the moment try to figure out how to put a tool together which makes use of the API. Because of the toolserver isn't our most reliable system component and it isn't the fastest.
Comment 33 bulwersator 2011-10-06 13:34:32 UTC
"It's been explained time and again that this isn't something that can
*possibly* be fixed server-side, the feature is *inherently* slow."

In the first place - why "it is/was slow" is valid reason to remove this? Is it using major part of CPU/RAM/bandwidth/hard drive space/etc ?
Comment 34 Domas Mituzas 2011-10-06 13:37:09 UTC
Frankly, I did see the commit first as a new feature, I wasn't supposed to cause a regression ;)
I saw this as a new feature being added and warned that it may be overly expensive in large scale environment. 

As it was used before and wasn't killed in an emergency, I guess we can enable it in 1.18.
Comment 35 Roan Kattouw 2011-10-06 13:50:43 UTC
(In reply to comment #34)
> As it was used before and wasn't killed in an emergency, I guess we can enable
> it in 1.18.
Done in r99102, r99104.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links