Last modified: 2013-06-18 16:26:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17129, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15129 - CheckUser Api Module
CheckUser Api Module
Status: RESOLVED DUPLICATE of bug 31723
Product: MediaWiki extensions
Classification: Unclassified
CheckUser (Other open bugs)
unspecified
All All
: Normal enhancement with 6 votes (vote)
: ---
Assigned To: X!
: patch
Depends on:
Blocks: noncoreapi
  Show dependency treegraph
 
Reported: 2008-08-12 01:54 UTC by FT2
Modified: 2013-06-18 16:26 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Mockup showing how the extra formats might appear. (50.65 KB, image/png)
2008-08-12 01:54 UTC, FT2
Details
Patch to add CheckUser API module (26.59 KB, patch)
2009-01-14 21:53 UTC, X!
Details
Fix copyright (26.56 KB, patch)
2009-01-14 21:57 UTC, X!
Details
Patch updated, stylized (26.74 KB, patch)
2011-05-23 20:07 UTC, Sam Reed (reedy)
Details
Non raw sql version! (26.38 KB, patch)
2011-05-23 21:02 UTC, Sam Reed (reedy)
Details

Description FT2 2008-08-12 01:54:26 UTC
Created attachment 5169 [details]
Mockup showing how the extra formats might appear.

Checkuser currently has one output format, namely HTML "pretty layout". Unfortunately in many CU cases, a Checkuser needs to export this data to another application. A single checkuser run can contain up to 5000 results, which there is no easy way to sort or filter. Examples include:

* To export to a spreadsheet (eg Excel or Openoffice Calc) to allow sorting, filtering, and further processing on long reports
* To combine multiple CU run's (eg on possibly connected users or IPs) that normally cannot be shown on the Checkuser output together - eg to list 5 users' activities over the last while, in order of IP. (Some complex checks can need dozens of runs to acuqire all data, and that data cannot be combined within the confines of the extension or easily sorted and correlated.) 
* To combine with existing checkuser runs, to correlate with past reports and look for significant items.

There is no means to do this except "HTML scraping", nor any CheckUser API interface. These are tremendously useful CU functions, and formats more amenable to exporting would greatly help checkusers on all projects. 

Requested - that an "output option" is added to the CU extension, that allows a user to switch between normal (pretty HTML), sortable wikitable, and XML, output. (Tabular output can usually be copy/pasted directly into relevant applications if needed.)

Note that this isn't a change to the database usage, but purely an option to choose other useful output formats for the results. The idea would be that all three views are available on separate tabs, saving repeat runs from being needed.
Comment 1 Aaron Schulz 2008-09-07 20:45:32 UTC
I'd rather combined/filtered reports be added in the core, rather than using external software. Also, I don't want to encourage people to store CU data, since they have to remember to delete it as it would normally expire. Data is stored temporarily for privacy policy reasons, which should also carry over to individual users.
Comment 2 FT2 2008-09-07 23:08:22 UTC
Sorry to reopen, but then - how do we do these kinds of complex cases? I have had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool, it needs to meet the task being done.
Comment 3 Aaron Schulz 2008-09-08 01:00:28 UTC
(In reply to comment #2)
> Sorry to reopen, but then - how do we do these kinds of complex cases? I have
> had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool,
> it needs to meet the task being done.
> 

Features should be added to the internal software, rather than relying on excel.
Comment 4 Mike.lifeguard 2008-09-08 01:01:41 UTC
Could you suggest what those features might be? I cannot immediately think of another output format that would solve this issue.
Comment 5 Aaron Schulz 2008-09-08 01:04:40 UTC
To provide sorting/filtering, an "output format" is not enough. Whatever filtering is used with these files, which hasn't been specified but must exists, should be added to the checkuser tool itself.

Additionally, the 5000 limit can be increased with timeout limit tweaks if really needed.
Comment 6 Roan Kattouw 2008-09-08 15:15:57 UTC
Why not just write an API module that interfaces with Checkuser?
Comment 7 FT2 2008-09-08 17:28:57 UTC
Yes. 

Yes, yes, yes, yes, yes :)

The request for output formats is more because whatever is done in software, most checkusers will not be using API programs or unusual configurations. They need basic tabular and sortable format, because unlike editing, checkuser work is pure data manipulation (and lots of it) and mediawiki will never provide all that's needed. 

So a way to export data to a program or format that will, is needed.
Comment 8 FT2 2008-09-08 17:31:07 UTC
What about, above the data you get a few icons to click, that let you download or open the data in 2 or 3 common formats.
Comment 9 Mike.lifeguard 2008-09-08 17:51:50 UTC
Changing description etc per above; let's have an API module.
Comment 10 X! 2009-01-14 13:01:54 UTC
I've started to work on this. So far, I have the API base file set up, and now I have to calculate the checkuser results.
Comment 11 X! 2009-01-14 21:53:56 UTC
Created attachment 5676 [details]
Patch to add CheckUser API module

I have written some bare code enabling this functionality (in other words, it needs review).
Comment 12 X! 2009-01-14 21:57:40 UTC
Created attachment 5677 [details]
Fix copyright

Because Reedy wanted me to. :)
Comment 13 Bryan Tong Minh 2009-01-14 22:01:23 UTC
A few general comments on this:
* You are basically duplicating the CheckUser interface. This is bad because changes in one part of the CheckUser code will not affect other parts. Code duplication should be avoided when possible.
* You are constructing raw sql, whereas using the wrapper functions is strongly recommended
* Limits are hardcoded
* XFF is added as /xff. In the api you will simply want to use a boolean xff parameter
Comment 14 Roan Kattouw 2009-01-14 22:17:43 UTC
(In reply to comment #11)
> Created an attachment (id=5676) [details]
> Patch to add CheckUser API module
> 
> I have written some bare code enabling this functionality (in other words, it
> needs review).
> 

Here you are:
* By convention, class names of API modules start with 'API', so use e.g. ApiCheckUser as class name (and hence ApiCheckUser.php as filename)
* 'cantcheckuser' message: "don't" misspelled as "dont"
* Parameters:
** Use ApiBase::PARAM_TYPE => 'user' for the user parameter (accepts IPs too) so you won't have to do username normalization
** Use ApiBase::PARAM_TYPE => array('allowed', 'values', 'here') for the type parameter
** If the duration parameter is a timestamp (not clear to me), use ApiBase::PARAM_TYPE => 'timestamp'
** If it's a number of days, use ApiBase::PARAM_TYPE => 'integer' and set ApiBase::PARAM_MIN and ApiBase::PARAM_MAX to appropriate values
** Use ApiBase::PARAM_DFLT => '' for the reason parameter rather than doing if(!isset($params['reason'])) $reason = '';
* You're using regexes to check for IPv4 and IPv6 IP's while you shouldn't: the /xff thing should go (like Bryan said) and the rest will be handled for you when you change the user parameter to be of type 'user'


(In reply to comment #13)
> A few general comments on this:
> * You are basically duplicating the CheckUser interface. This is bad because
> changes in one part of the CheckUser code will not affect other parts. Code
> duplication should be avoided when possible.
Bryan's right. You should call existing functions in CheckUser, not copy them. If CheckUser doesn't have nice functions for your purposes, create them and/or adapt existing ones to be nice to you.

> * You are constructing raw sql, whereas using the wrapper functions is strongly
> recommended
Eek! Looks like that's part of the code copied from CheckUser, though.

> * Limits are hardcoded
Your module should accept a 'limit' parameter:
        'limit' => array (
                                 ApiBase :: PARAM_DFLT => 10,
                                 ApiBase :: PARAM_TYPE => 'limit',
                                 ApiBase :: PARAM_MIN => 1,
                                 ApiBase :: PARAM_MAX => ApiBase :: LIMIT_BIG1,
                                 ApiBase :: PARAM_MAX2 => ApiBase :: LIMIT_BIG2
                         )

> * XFF is added as /xff. In the api you will simply want to use a boolean xff
> parameter
> 
'xff' => false will do that. $params['xff'] will be a simple boolean then, so no need for isset($params['xff']) checks.
Comment 15 X! 2009-01-14 22:20:28 UTC
> (In reply to comment #13)
> > A few general comments on this:
> > * You are basically duplicating the CheckUser interface. This is bad because
> > changes in one part of the CheckUser code will not affect other parts. Code
> > duplication should be avoided when possible.
> Bryan's right. You should call existing functions in CheckUser, not copy them.
> If CheckUser doesn't have nice functions for your purposes, create them and/or
> adapt existing ones to be nice to you.
All the functions in the CheckUser class are protected functions, and I can't have the code output them. In addition, they output pure HTML (which is bad in its own right). 

The entire CheckUser code could use a rewrite.
Comment 16 Roan Kattouw 2009-01-14 22:33:20 UTC
(In reply to comment #15)
> > (In reply to comment #13)
> > > A few general comments on this:
> > > * You are basically duplicating the CheckUser interface. This is bad because
> > > changes in one part of the CheckUser code will not affect other parts. Code
> > > duplication should be avoided when possible.
> > Bryan's right. You should call existing functions in CheckUser, not copy them.
> > If CheckUser doesn't have nice functions for your purposes, create them and/or
> > adapt existing ones to be nice to you.
> All the functions in the CheckUser class are protected functions, and I can't
> have the code output them. In addition, they output pure HTML (which is bad in
> its own right). 
> 
> The entire CheckUser code could use a rewrite.
> 

No doubt it could. But unless it's extremely API-unfriendly, it shouldn't be too hard to split off the HTML-outputting functions into a backend part (that does the database stuff) and a frontend part (that does the HTML generation), and then just call the backend part from the API module (of course that means you should make the backend functions public). That's the tactic I used when writing most of the write API.
Comment 17 Sam Reed (reedy) 2010-01-31 00:06:12 UTC
Not quite sure this is still under Api, assigning to CentralAuth
Comment 18 X! 2010-07-11 16:59:43 UTC
For the curious, I've started work on this again. So far I've got most of the base functions, all of the classes laid out, and the API is successfully doing a User -> IP check.
Comment 19 p858snake 2011-04-30 00:09:57 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 20 Sam Reed (reedy) 2011-05-23 19:46:59 UTC
Is this patch/file any use?

What else needs doing?
Comment 21 Sam Reed (reedy) 2011-05-23 20:07:08 UTC
Created attachment 8572 [details]
Patch updated, stylized
Comment 22 Sam Reed (reedy) 2011-05-23 20:07:33 UTC
(In reply to comment #20)
> Is this patch/file any use?
> 
> What else needs doing?

Looks like it needs a lot of code issues (raw sql) etc fixing up...
Comment 23 Sam Reed (reedy) 2011-05-23 21:02:00 UTC
Created attachment 8573 [details]
Non raw sql version!
Comment 24 Sam Reed (reedy) 2011-05-23 23:12:23 UTC
Committed the API module in r88704
Comment 25 Sam Reed (reedy) 2011-05-23 23:14:25 UTC
Comment on attachment 8573 [details]
Non raw sql version!

Obsolete as it's supplied
Comment 26 Bryan Tong Minh 2011-05-24 08:38:32 UTC
Except for the second one, my comments in #13 were not addressed:
> A few general comments on this:
> * You are basically duplicating the CheckUser interface. This is bad because
> changes in one part of the CheckUser code will not affect other parts. Code
> duplication should be avoided when possible.
> * You are constructing raw sql, whereas using the wrapper functions is strongly
> recommended
> * Limits are hardcoded
> * XFF is added as /xff. In the api you will simply want to use a boolean xff
> parameter

Especially with a high sensitivity module such as CU, I really do want to have a common backend for the special page and the API module. With all the code duplication the chances that are high that security vulnerabilities and other bugs that are found in the API module and fixed are not propagated to the special page and vice versa. Considering the sensitivity of CheckUser I do not think that this is acceptable.
Comment 27 X! 2011-05-24 10:37:43 UTC
The patch is pretty outdated, and I don't even think it works with the current version. I'm planning on finishing this during the summer, when I finally have an abundance of time to work on it.
Comment 28 Brion Vibber 2011-05-24 17:48:21 UTC
Reverted the commit of the old patch in r88728; please clean it up and remove code duplication before recommitting.

(The module was disabled already in r88719.)
Comment 29 Bryan Tong Minh 2011-10-15 09:47:42 UTC
*** Bug 31723 has been marked as a duplicate of this bug. ***
Comment 30 John Du Hart 2011-10-19 09:46:55 UTC
Solved in Bug 31723

*** This bug has been marked as a duplicate of bug 31723 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links