Last modified: 2014-10-29 21:11:53 UTC
On enwiki, the user_properties table has about 60M rows, for only 20M users. This is incredible considering that user_properties is meant to only store non-default options, to reduce DB space. The index length is about 2.2 GB, and the data size is about 3.7GB. By sampling, the number of user_properties rows per user can be estimated. The problem is very dependent on user_id, and is mostly confined to user_id values less than 10M, i.e. users created before mid-2009. user_id props/user --------------------------- 0 8.2615 1000000 5.8696 2000000 4.9534 3000000 4.8038 4000000 4.6013 5000000 4.3775 6000000 4.4137 7000000 5.3833 8000000 5.8919 9000000 6.4356 10000000 0.8789 11000000 1.1052 12000000 1.0005 13000000 0.9774 14000000 1.0987 Sampling 1000 users with user_id<10M, we find that the main culprits are: searchNs-1 : 968 users skin: 964 users thumbsize: 912 users 75% of the skin rows have an empty string as their value, which causes Skin::newFromKey() to return the default skin, same as if the row was missing. The rest are mostly "monobook", presumably manually set via the UsabilityInitiative OptIn extension. "searchNs-1" is a bug, it relates to searching the special namespace, which is not possible. It is "0" in all sampled rows. "thumbsize" is "3" in all sampled rows, which is not the default, the default is "4" on all WMF wikis other than svwiki. In addition to bloat of the user_properties table, this causes fragmentation of the parser cache. There's no way 91% of users prior to 2009 manually set this value, it must have been set by a bug. We should remove unnecessary or incorrectly inserted rows, and ensure that this does not happen again (e.g. as a consequence of the resolution of bug 36316).
Thank you for researching this. (In reply to comment #0) > Sampling 1000 users with user_id<10M, we find that the main culprits are: > > searchNs-1 : 968 users > skin: 964 users > thumbsize: 912 users > > 75% of the skin rows have an empty string as their value, which causes > Skin::newFromKey() to return the default skin, same as if the row was > missing. The rest are mostly "monobook", presumably manually set via the > UsabilityInitiative OptIn extension. I've filed bug 52778 to track the skin issue, as I think it's more complex than the other two issues mentioned and I think it should be resolved with a maintenance script to allow third-party MediaWiki users to clean up their own databases. > "searchNs-1" is a bug, it relates to searching the special namespace, which > is not possible. It is "0" in all sampled rows. > > "thumbsize" is "3" in all sampled rows, which is not the default, the default > is "4" on all WMF wikis other than svwiki. In addition to bloat of the > user_properties table, this causes fragmentation of the parser cache. There's > no way 91% of users prior to 2009 manually set this value, it must have been > set by a bug. I'm not sure correcting these database table row anomalies will require a maintenance script. The cleanup for "searchNs-1" and "thumbsize" can probably be done with a simple query across all.dblist. However, if a maintenance script is needed for either of these, separate bugs should be filed as dependencies of this bug.
(In reply to comment #1) > I'm not sure correcting these database table row anomalies will require a > maintenance script. The cleanup for "searchNs-1" and "thumbsize" can probably > be done with a simple query across all.dblist. However, if a maintenance > script is needed for either of these, separate bugs should be filed as > dependencies of this bug. You can't really delete 9M rows in one query, it will cause replication lag. But the generic maintenance script runBatchedQuery.php can be used.
(In reply to comment #0) > We should remove unnecessary or incorrectly inserted rows, and ensure that > this > does not happen again (e.g. as a consequence of the resolution of bug 36316). How to avoid that it happens again? Just changing a default should not have this effect, right? What about changing a default only for new users via post-registration hook as done for some things now? Thumb default was changed in 2010 with bug 21117. On en.wiki there were also some load tests before the actual, global switch.
There is also a userOptions.php which will delete rows, when you change a preferences to a default state, but you have to run it for each old value seperate.
You can remove the showjumplinks preference from the database, because it was removed from core: Gerrit change #25751 Due to the watchlist token stored in the user_properties table for each user which has visited the watchlist exists one row.
You can remove the 'vector-collapsiblenav' preference from database, because it was removed from vector extension and not migrated to core: Gerrit change #83591
ajaxsearch doesn't seem to be used anymore either, there's another 769131 rows that can be dropped.
Not only wrong values, but also wrong keys are stored (need to find my data, adding this as a reminder). Also - deleted or renamed gadgets are still stored as well (I am not sure, but I think I submitted it as a bug, but I clearly remember also being told by one of the devs that "it's not a bug, it's a feature").
You can remove 'disablesuggest' preference from the database, because it was removed from core with Gerrit change #99163 You may find more by looking at the fixed bugs which depends on bug 52807
You can remove 'nocache' preference from the database, was removed from core with Gerrit change #98276.
https://gerrit.wikimedia.org/r/100938 should improve the situation a bit.
Expand a existing maintenance script to clean up the database: Gerrit change #101233
You can remove 'searchlimit' preference from the database, was removed from core with Gerrit change #105871.
Tim, it's still not clear to me what alternative you propose for bug 36316 not to be a problem. (We've been blocked one more year just to save some thousands DB rows per day?) > You can remove the 'vector-collapsiblenav' preference from database, because > it was removed > ajaxsearch doesn't seem to be used anymore either, there's another 769131 > rows that can be dropped. > You can remove 'disablesuggest' preference from the database, because it was > removed from core > You can remove 'searchlimit' preference from the database, was removed from > core How? Umherirrender, if there is a script already suitable for that, please file one shell request for each.
(In reply to Nemo from comment #14) > How? Umherirrender, if there is a script already suitable for that[...] See comment #2: (In reply to Tim Starling from comment #2) > You can't really delete 9M rows in one query, it will cause replication lag. > But the generic maintenance script runBatchedQuery.php can be used. (In reply to Nemo from comment #14) > [...] please file one shell request for each. I will not do that. The list is not closed, so maybe there are more preferences to be removed.
> (In reply to Nemo from comment #14) > > [...] please file one shell request for each. > > I will not do that. The list is not closed, so maybe there are more > preferences to be removed. Hence I said one for each. :) Anyway, easier done than said.
(In reply to Nemo from comment #14) > Tim, it's still not clear to me what alternative you propose for bug 36316 > not to be a problem. (We've been blocked one more year just to save some > thousands DB rows per day?) Have a default that depends on user registration date. Or just set watchdefault=1 by default for all users.
(In reply to Tim Starling from comment #17) > (In reply to Nemo from comment #14) > > Tim, it's still not clear to me what alternative you propose for bug 36316 > > not to be a problem. (We've been blocked one more year just to save some > > thousands DB rows per day?) > > Have a default that depends on user registration date. We should be able to do this easily for extensions with the new hook added in I1da936c786adb21e2c1802ef405bb904c9cf4918.
Thanks Reedy for the cleanup. Tim said: > Or just set watchdefault=1 by default for all users. If you're willing to merge it, that's ok. But let's see how I1da936c786adb21e2c1802ef405bb904c9cf4918 goes.
(In reply to Nemo from comment #19) > If you're willing to merge it, that's ok. But let's see how > I1da936c786adb21e2c1802ef405bb904c9cf4918 goes. That patch was merged.