Last modified: 2014-10-29 21:11:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54777, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52777 - user_properties table bloat
user_properties table bloat
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on: 52542 52778 64752 64753 64754 64755 64756 64757
Blocks: 29782 36316
  Show dependency treegraph
 
Reported: 2013-08-13 04:18 UTC by Tim Starling
Modified: 2014-10-29 21:11 UTC (History)
18 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Starling 2013-08-13 04:18:43 UTC
On enwiki, the user_properties table has about 60M rows, for only 20M users. This is incredible considering that user_properties is meant to only store non-default options, to reduce DB space. The index length is about 2.2 GB, and the data size is about 3.7GB.

By sampling, the number of user_properties rows per user can be estimated. The problem is very dependent on user_id, and is mostly confined to user_id values less than 10M, i.e. users created before mid-2009.

user_id         props/user
---------------------------
0               8.2615
1000000         5.8696
2000000         4.9534
3000000         4.8038
4000000         4.6013
5000000         4.3775
6000000         4.4137
7000000         5.3833
8000000         5.8919
9000000         6.4356
10000000        0.8789
11000000        1.1052
12000000        1.0005
13000000        0.9774
14000000        1.0987

Sampling 1000 users with user_id<10M, we find that the main culprits are:

searchNs-1 : 968 users
skin: 964 users
thumbsize: 912 users

75% of the skin rows have an empty string as their value, which causes Skin::newFromKey() to return the default skin, same as if the row was missing. The rest are mostly "monobook", presumably manually set via the UsabilityInitiative OptIn extension.

"searchNs-1" is a bug, it relates to searching the special namespace, which is not possible. It is "0" in all sampled rows.

"thumbsize" is "3" in all sampled rows, which is not the default, the default is "4" on all WMF wikis other than svwiki. In addition to bloat of the user_properties table, this causes fragmentation of the parser cache. There's no way 91% of users prior to 2009 manually set this value, it must have been set by a bug.

We should remove unnecessary or incorrectly inserted rows, and ensure that this does not happen again (e.g. as a consequence of the resolution of bug 36316).
Comment 1 MZMcBride 2013-08-13 04:40:22 UTC
Thank you for researching this.

(In reply to comment #0)
> Sampling 1000 users with user_id<10M, we find that the main culprits are:
> 
> searchNs-1 : 968 users
> skin: 964 users
> thumbsize: 912 users
> 
> 75% of the skin rows have an empty string as their value, which causes
> Skin::newFromKey() to return the default skin, same as if the row was
> missing. The rest are mostly "monobook", presumably manually set via the
> UsabilityInitiative OptIn extension.

I've filed bug 52778 to track the skin issue, as I think it's more complex than the other two issues mentioned and I think it should be resolved with a maintenance script to allow third-party MediaWiki users to clean up their own databases.

> "searchNs-1" is a bug, it relates to searching the special namespace, which
> is not possible. It is "0" in all sampled rows.
> 
> "thumbsize" is "3" in all sampled rows, which is not the default, the default
> is "4" on all WMF wikis other than svwiki. In addition to bloat of the
> user_properties table, this causes fragmentation of the parser cache. There's
> no way 91% of users prior to 2009 manually set this value, it must have been
> set by a bug.

I'm not sure correcting these database table row anomalies will require a maintenance script. The cleanup for "searchNs-1" and "thumbsize" can probably be done with a simple query across all.dblist. However, if a maintenance script is needed for either of these, separate bugs should be filed as dependencies of this bug.
Comment 2 Tim Starling 2013-08-13 04:47:40 UTC
(In reply to comment #1)
> I'm not sure correcting these database table row anomalies will require a
> maintenance script. The cleanup for "searchNs-1" and "thumbsize" can probably
> be done with a simple query across all.dblist. However, if a maintenance
> script is needed for either of these, separate bugs should be filed as 
> dependencies of this bug.

You can't really delete 9M rows in one query, it will cause replication lag. But the generic maintenance script runBatchedQuery.php can be used.
Comment 3 Nemo 2013-08-13 11:56:42 UTC
(In reply to comment #0)
> We should remove unnecessary or incorrectly inserted rows, and ensure that
> this
> does not happen again (e.g. as a consequence of the resolution of bug 36316).

How to avoid that it happens again? Just changing a default should not have this effect, right? What about changing a default only for new users via post-registration hook as done for some things now?

Thumb default was changed in 2010 with bug 21117. On en.wiki there were also some load tests before the actual, global switch.
Comment 4 db [inactive,noenotif] 2013-08-22 18:26:05 UTC
There is also a userOptions.php which will delete rows, when you change a preferences to a default state, but you have to run it for each old value seperate.
Comment 5 Umherirrender 2013-08-23 13:27:21 UTC
You can remove the showjumplinks preference from the database, because it was removed from core: Gerrit change #25751

Due to the watchlist token stored in the user_properties table for each user which has visited the watchlist exists one row.
Comment 6 Umherirrender 2013-10-26 07:01:12 UTC
You can remove the 'vector-collapsiblenav' preference from database, because it was removed from vector extension and not migrated to core: Gerrit change #83591
Comment 7 Chad H. 2013-12-05 01:35:53 UTC
ajaxsearch doesn't seem to be used anymore either, there's another 769131 rows that can be dropped.
Comment 8 Danny B. 2013-12-05 01:53:29 UTC
Not only wrong values, but also wrong keys are stored (need to find my data, adding this as a reminder).

Also - deleted or renamed gadgets are still stored as well (I am not sure, but I think I submitted it as a bug, but I clearly remember also being told by one of the devs that "it's not a bug, it's a feature").
Comment 9 Umherirrender 2013-12-06 13:59:24 UTC
You can remove 'disablesuggest' preference from the database, because it was removed from core with Gerrit change #99163

You may find more by looking at the fixed bugs which depends on bug 52807
Comment 10 Umherirrender 2013-12-10 18:38:43 UTC
You can remove 'nocache' preference from the database, was removed from core with Gerrit change #98276.
Comment 11 Max Semenik 2013-12-12 00:55:00 UTC
https://gerrit.wikimedia.org/r/100938 should improve the situation a bit.
Comment 12 Umherirrender 2013-12-13 17:56:54 UTC
Expand a existing maintenance script to clean up the database: Gerrit change #101233
Comment 13 Umherirrender 2014-01-16 20:13:29 UTC
You can remove 'searchlimit' preference from the database, was removed from core
with Gerrit change #105871.
Comment 14 Nemo 2014-05-02 14:13:51 UTC
Tim, it's still not clear to me what alternative you propose for bug 36316 not to be a problem. (We've been blocked one more year just to save some thousands DB rows per day?)

> You can remove the 'vector-collapsiblenav' preference from database, because
> it was removed

> ajaxsearch doesn't seem to be used anymore either, there's another 769131
> rows that can be dropped.

> You can remove 'disablesuggest' preference from the database, because it was
> removed from core

> You can remove 'searchlimit' preference from the database, was removed from
> core

How? Umherirrender, if there is a script already suitable for that, please file one shell request for each.
Comment 15 Umherirrender 2014-05-02 16:42:30 UTC
(In reply to Nemo from comment #14)
> How? Umherirrender, if there is a script already suitable for that[...]

See comment #2: 
(In reply to Tim Starling from comment #2)
> You can't really delete 9M rows in one query, it will cause replication lag.
> But the generic maintenance script runBatchedQuery.php can be used.


(In reply to Nemo from comment #14)
> [...] please file one shell request for each.

I will not do that. The list is not closed, so maybe there are more preferences to be removed.
Comment 16 Nemo 2014-05-02 17:05:37 UTC
> (In reply to Nemo from comment #14)
> > [...] please file one shell request for each.
> 
> I will not do that. The list is not closed, so maybe there are more
> preferences to be removed.

Hence I said one for each. :) Anyway, easier done than said.
Comment 17 Tim Starling 2014-07-09 22:15:55 UTC
(In reply to Nemo from comment #14)
> Tim, it's still not clear to me what alternative you propose for bug 36316
> not to be a problem. (We've been blocked one more year just to save some
> thousands DB rows per day?)

Have a default that depends on user registration date. Or just set watchdefault=1 by default for all users.
Comment 18 Kunal Mehta (Legoktm) 2014-07-09 22:17:04 UTC
(In reply to Tim Starling from comment #17)
> (In reply to Nemo from comment #14)
> > Tim, it's still not clear to me what alternative you propose for bug 36316
> > not to be a problem. (We've been blocked one more year just to save some
> > thousands DB rows per day?)
> 
> Have a default that depends on user registration date.

We should be able to do this easily for extensions with the new hook added in I1da936c786adb21e2c1802ef405bb904c9cf4918.
Comment 19 Nemo 2014-07-09 23:30:50 UTC
Thanks Reedy for the cleanup.

Tim said:
> Or just set watchdefault=1 by default for all users.

If you're willing to merge it, that's ok. But let's see how I1da936c786adb21e2c1802ef405bb904c9cf4918 goes.
Comment 20 Helder 2014-08-23 17:36:55 UTC
(In reply to Nemo from comment #19)
> If you're willing to merge it, that's ok. But let's see how
> I1da936c786adb21e2c1802ef405bb904c9cf4918 goes.
That patch was merged.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links