Last modified: 2014-06-18 03:24:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29807, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27807 - Restore missing CheckUser logs
Restore missing CheckUser logs
Status: NEW
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks: SWMT
  Show dependency treegraph
 
Reported: 2011-03-01 12:31 UTC by Dominic
Modified: 2014-06-18 03:24 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dominic 2011-03-01 12:31:14 UTC
Apologies in advance if this was already reported/responded to (I thought I had reported it long ago).

The CheckUser logs currently date back to December 2006, while CheckUser as a user right in its current logged form dates back to June 2005. Prior to December 2006, the CheckUser broke entirely and the entire log up to that point went with it. This is true across all projects. Ideally, we should restore the missing log entries from that first year and a half of CheckUser. The record of these log entries still exists, they just need to be added to the log visible on the projects. I have the missing logs in a text file on my computer, but since that was sent to me by Tim, I assume it can be retrieved by developers somehow.
Comment 1 p858snake 2011-03-01 12:34:26 UTC
Not shelling yet, probably needs a maintaince script or something written first.
Comment 2 Sam Reed (reedy) 2011-03-01 19:20:02 UTC
What's the format of the file?

I'm guessing it's gonna be very simple, string split comma's, and then just do a database insert

Depending of course on how Tim generated that before
Comment 3 Dominic 2011-03-01 20:27:27 UTC
I have a .log file (i.e., plain text in a text editor) with hundreds of lines like:

<li>23:35, June 13, 2006 Dmcdevit got IPs for Dmcdevit on enwiki</li>

The main complication may be that this is from the days of the single global log, so there are also entries like:

<li>20:46, 1 lip 2005 Taw got IPs for [user]</li>

Or it may be that however the local logs were originally created can also be applied to these log entries.
Comment 4 Thehelpfulone 2012-05-01 13:46:37 UTC
Has this been completed or does it still need to be completed?
Comment 5 Trijnstel 2013-12-05 16:42:29 UTC
(In reply to comment #4)
> Has this been completed or does it still need to be completed?

It still needs to be done. I always wondered why these logs were missing.
Comment 6 Kunal Mehta (Legoktm) 2013-12-05 16:51:55 UTC
(In reply to comment #3)
> I have a .log file (i.e., plain text in a text editor) with hundreds of lines
> like:
> 
> <li>23:35, June 13, 2006 Dmcdevit got IPs for Dmcdevit on enwiki</li>
> 
> The main complication may be that this is from the days of the single global
> log, so there are also entries like:
> 
> <li>20:46, 1 lip 2005 Taw got IPs for [user]</li>
> 
> Or it may be that however the local logs were originally created can also be
> applied to these log entries.

Do all the log entries state which wiki the check was run on? Your first example does and the second doesn't.

I don't know the history behind the CU extension, so how did the global log work? Where should we restore the log entries to?
Comment 7 Trijnstel 2013-12-05 17:40:08 UTC
(In reply to comment #6)
> I don't know the history behind the CU extension, so how did the global log
> work? Where should we restore the log entries to?

You can find some information on these old bugs:
* https://bugzilla.wikimedia.org/show_bug.cgi?id=8710
* https://bugzilla.wikimedia.org/show_bug.cgi?id=13789
Comment 8 Kunal Mehta (Legoktm) 2013-12-05 17:59:47 UTC
(In reply to comment #7)
> You can find some information on these old bugs:
> * https://bugzilla.wikimedia.org/show_bug.cgi?id=8710
> * https://bugzilla.wikimedia.org/show_bug.cgi?id=13789

Thanks. Turns out the code has already been written: https://github.com/wikimedia/mediawiki-extensions-CheckUser/blob/master/importLog.php

A shell user will need to get the log file, and then run the import script.
Comment 9 Trijnstel 2013-12-20 21:11:33 UTC
CCd Tim Starling, see https://en.wikipedia.org/w/index.php?title=User_talk:Dominic&oldid=587005235#Old_CU_logs

Tim, do you remember this? Do you know how to obtain the missing logs?
Comment 10 Tim Starling 2014-01-09 06:04:24 UTC
The log was in /home/wikipedia/logs. That directory was repurposed for MW UDP logs with automatic rotation, it's possible that the files were lost by the automatic rotation script at around that time. I couldn't find any backup on the server. However, I happen to have the relevant files on my hard drive, for June 2005 to May 2007. Note that that range overlaps with the range that is said to be in the database already, so duplicates will have to be removed somehow.

I copied them up to /home/wikipedia/logs/norotate/checkuser
Comment 11 Sam Reed (reedy) 2014-01-23 03:59:44 UTC
So, Legoktm and I were just looking at it. There's a few broken entries that can be easily fixed with common sense (newlines in the middle and such).

The date regex is fair to naive to cater for all the localised date formats.

$rxTimestamp = '(?P<timestamp>\d+:\d+, \d+ \w+ \d+)';

We tried using '(?P<timestamp>.*?)'. It's a bit better, but with the optional comma after, but then causes issues with dates with early commas

[bad timestamp] <li>۲۱:۲۰, ۲۰ اکتبر ۲۰۰۶ Jon Harald Søby got edits XXX.XXX.XXX.XXX on fawiki</li>

And others such as 2006-10-25T20:29:01

	$regexes = array(
		'ipedits-xff' => "!^<li>$rxTimestamp,? $rxUser got edits for XFF $rxTarget on $rxWiki$rxReason</li>!",
		'ipedits'     => "!^<li>$rxTimestamp,? $rxUser got edits for" ." $rxTarget on $rxWiki$rxReason</li>!",
		'ipusers-xff' => "!^<li>$rxTimestamp,? $rxUser got users for XFF $rxTarget on $rxWiki$rxReason</li>!",
		'ipusers'     => "!^<li>$rxTimestamp,? $rxUser got users for" ." $rxTarget on $rxWiki$rxReason</li>!",
		'userips'     => "!^<li>$rxTimestamp,? $rxUser got IPs for".   " $rxTarget on $rxWiki$rxReason</li>!"
	);

The first comma seems to be optional between some formats, so was easily improved on.

The code is also using strtotime(), which isn't so good for these localised formats "Parse about any English textual datetime description into a Unix timestamp" - http://us1.php.net/strtotime


I'm guessing that the timestamp is in whatever format the person who did the action has set in their preferences. Awesome, no?

There seems to be 10-20% of rows that won't be processed without at least some manipulation of the code as it currently is

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links