Last modified: 2014-01-03 16:10:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39626, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37626 - CheckUser log indefinitely retains private information
CheckUser log indefinitely retains private information
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
CheckUser (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-15 14:54 UTC by MZMcBride
Modified: 2014-01-03 16:10 UTC (History)
15 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-06-15 14:54:13 UTC
Currently the log associated with the CheckUser extension stores its information indefinitely. This log contains private information in a number of ways. It may make sense to truncate this log after a certain amount of time.

This is somewhat related to bug 37573.
Comment 1 Marc A. Pelletier 2012-07-19 16:22:19 UTC
That seems reasonable, but that amount of time should be at least one year - preferably two.  It's important that it be possible to investigate complaints or impropriety in the use of CheckUser that might not have been suspected or noted previously.

Then again, it's clear that "indefinite" is much too long.  I don't think anything beyond five years can be justified.
Comment 2 Félix M. (elfix) 2012-07-19 17:13:42 UTC
Err... You seem to be forgetting that the log's primary use isn't for dealing with complaints. One can look up every checks performed upon an IP or IP range; this is its main use. Helps a lot with long term vandals. 2 years is far too short in that respect.
Comment 3 Marc A. Pelletier 2012-07-19 18:27:22 UTC
The CheckUser wiki serves the need of retaining long-term data for persistent vandals; the logs really only should be used for auditing purposes.  (For one thing, it's not always clear whether an IP check following a name check truly is related, and the absence of the result and context make using logs for anything else but auditing fraught with dangers).
Comment 4 Philippe Beaudette 2012-07-19 18:56:00 UTC
Checkuser wiki helps once you've identified a long term vandal.  But without logs, you lose any info on what that vandal has done in the past, until it's manually moved to the Checkuser wiki.  So losing the logs would lose some of the historical data that's rich for predicting how they operate.

With that said, I think MZ's initial comment here is not off base.  Personally, I'd be fine with truncating the data somehow, but I'm not the primary user of that tool: checkusers and stewards are.  Particularly for smaller wikis, we should retain the logs for some time, but I don't think ther'es a need to retain them indefinitely.  

I"m going to chat with LCA's lawyers and see where they fall on the question.
Comment 5 Trijnstel 2012-07-19 19:55:35 UTC
I agree with Félix M. and Philippe Beaudette here. The CheckUser log contains valuable information for us to check whether an IP range contained vandal accounts for a longer period of time. I'm against this change.
Comment 6 charitwo 2012-08-13 02:50:30 UTC
Also in agreement with Felix, Philippe, and Trijnstel. The ability to search past cases of abuse without truncation is invaluable.
Comment 7 MZMcBride 2012-08-13 22:43:14 UTC
(In reply to comment #4)
> With that said, I think MZ's initial comment here is not off base.  Personally,
> I'd be fine with truncating the data somehow, but I'm not the primary user of
> that tool: checkusers and stewards are.  Particularly for smaller wikis, we
> should retain the logs for some time, but I don't think ther'es a need to
> retain them indefinitely.  

Truncation is one option. Anonymization of the IP address information is another option. I think that's what places such as Google do. Or just removing the IP checks from the log altogether after a certain period of time, right? And just keeping the checks of usernames? Though... maybe truncation is best. I'm not sure much good comes from keeping this data around indefinitely.

> I"m going to chat with LCA's lawyers and see where they fall on the question.

Any follow up on this?
Comment 8 Trijnstel 2012-08-13 23:10:15 UTC
(In reply to comment #7)
> (In reply to comment #4)
> > With that said, I think MZ's initial comment here is not off base.  Personally,
> > I'd be fine with truncating the data somehow, but I'm not the primary user of
> > that tool: checkusers and stewards are.  Particularly for smaller wikis, we
> > should retain the logs for some time, but I don't think ther'es a need to
> > retain them indefinitely.  
> 
> Truncation is one option. Anonymization of the IP address information is
> another option. I think that's what places such as Google do. Or just removing
> the IP checks from the log altogether after a certain period of time, right?
> And just keeping the checks of usernames? Though... maybe truncation is best.
> I'm not sure much good comes from keeping this data around indefinitely.
> 
> > I"m going to chat with LCA's lawyers and see where they fall on the question.
> 
> Any follow up on this?

Again, strongly against this. We need to know the IPs - and there isn't much info left in the logs besides the IPs and accounts. If we lose these too we can't perform our checks well anymore. I really hope this isn't going to happen.
Comment 9 Rax 2012-08-20 21:40:20 UTC
I strongly agree to what Trijnstel and others wrote above: The long term CU-Logs are simply instruments to preserve Wikipedias quality and to protect users from vandals, to cut this instrument will make the work more difficult.

Apart from this - hey - by definition there are only very few users with access to the logs. These users are elected by the community or an arbcom as trusted to deal respectfully and cautious with the data they have access to - and they do so.

(excuse my broken english please)
Comment 10 Trijnstel 2013-12-05 17:11:19 UTC
There is clearly no consensus for this change (even Philippe agreed with that) and with no new comments since a year I close this as "RESOLVED WONTFIX".

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links