Last modified: 2014-02-13 23:52:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 18638 - Update fixUserRegistration.php to use newuserlog (where available, prior to r12207), and gaussian estimates for the fossils
Update fixUserRegistration.php to use newuserlog (where available, prior to r...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.15.x
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks: 22097
  Show dependency treegraph
 
Reported: 2009-05-01 02:49 UTC by Splarka
Modified: 2014-02-13 23:52 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Sampling of normalizable user first-contribution curve (9.96 KB, image/gif)
2009-05-02 08:30 UTC, Splarka
Details

Description Splarka 2009-05-01 02:49:58 UTC
User's created before r12207 who have no edits until after r12207 are not assigned guessed data for user.user_registration. This is not critical but often is very confusing and sometimes wildly inaccurate. There are several months worth of data on Wikimedia wikis in the new user log from the extension (see r10573 ) that could populate this data.

Also, for users prior to even the extension, a gaussian curve could be plotted from the data of available edits and log entries (all of which would be after the creation date) and normalized to a curve or wave of user creation date/ID.

Awaiting WONTFIX!
Comment 1 Happy-melon 2009-05-01 10:51:57 UTC
Go on, then, let's see this gaussian curve of yours :D  Might as well work for your wontfix!!  

The other suggestion, however, is good; that extension provided accurate log data; A quick check on the toolserver suggests that there are at least 290,000 entries in the relevant period; a substantial fraction of these could be recovered in this fashion. It should probably be a separate script, though; there's no guarrantee that wikis needing to populate the column would have had the extension installed, and no point in the script trying to use that data if it's not present.  
Comment 2 Splarka 2009-05-02 05:21:36 UTC
> Go on, then, let's see this gaussian curve of yours :D

Too slow of a query to do it for everyone without actually, yknow, DOING it, as in populating the data. But here is 5000 from en.wp. Note there isn't much curve to it, and it skips all users with double/nulls, but there is definitely a trend line:
http://test.wikipedia.org/wiki/File:Example_of_user_first_actions_for_en.wp_400000-405000.gif

Comment 3 Splarka 2009-05-02 08:30:47 UTC
Created attachment 6080 [details]
Sampling of normalizable user first-contribution curve

Here is a more distributed sampling, of all users from 1k-750k (1:1000).

Copied from http://test.wikipedia.org/wiki/File:Example_of_user_first_actions_for_en.wp_1-750000_(by_thousand).gif
Comment 4 Happy-melon 2009-05-02 10:16:57 UTC
Wow, that's a much better fit than I was expecting, TBH.  And the outliers tell their own story; particularly interesting the ones on the second graph that were registered in 2001-03, but not used until around 2008... More ammunition (as if it were needed) against deleting old accounts.

Still not entirely sure how you'd convert that data into registration timestamps, or are you going to assume that the curve approximately follows the registration time; that is, the average delay between registering and editing is zero? Seems a justifiable assumption, but I notice the curve gets a bit wobbly at the top; lots of double NULLs in the data...
Comment 5 Chad H. 2011-05-06 17:53:19 UTC
*** Bug 22097 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links