Last modified: 2013-06-18 16:26:53 UTC
Created attachment 5169 [details] Mockup showing how the extra formats might appear. Checkuser currently has one output format, namely HTML "pretty layout". Unfortunately in many CU cases, a Checkuser needs to export this data to another application. A single checkuser run can contain up to 5000 results, which there is no easy way to sort or filter. Examples include: * To export to a spreadsheet (eg Excel or Openoffice Calc) to allow sorting, filtering, and further processing on long reports * To combine multiple CU run's (eg on possibly connected users or IPs) that normally cannot be shown on the Checkuser output together - eg to list 5 users' activities over the last while, in order of IP. (Some complex checks can need dozens of runs to acuqire all data, and that data cannot be combined within the confines of the extension or easily sorted and correlated.) * To combine with existing checkuser runs, to correlate with past reports and look for significant items. There is no means to do this except "HTML scraping", nor any CheckUser API interface. These are tremendously useful CU functions, and formats more amenable to exporting would greatly help checkusers on all projects. Requested - that an "output option" is added to the CU extension, that allows a user to switch between normal (pretty HTML), sortable wikitable, and XML, output. (Tabular output can usually be copy/pasted directly into relevant applications if needed.) Note that this isn't a change to the database usage, but purely an option to choose other useful output formats for the results. The idea would be that all three views are available on separate tabs, saving repeat runs from being needed.
I'd rather combined/filtered reports be added in the core, rather than using external software. Also, I don't want to encourage people to store CU data, since they have to remember to delete it as it would normally expire. Data is stored temporarily for privacy policy reasons, which should also carry over to individual users.
Sorry to reopen, but then - how do we do these kinds of complex cases? I have had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool, it needs to meet the task being done.
(In reply to comment #2) > Sorry to reopen, but then - how do we do these kinds of complex cases? I have > had cases routinely requiring 50 - 200 checks to cover it. Checkuser is a tool, > it needs to meet the task being done. > Features should be added to the internal software, rather than relying on excel.
Could you suggest what those features might be? I cannot immediately think of another output format that would solve this issue.
To provide sorting/filtering, an "output format" is not enough. Whatever filtering is used with these files, which hasn't been specified but must exists, should be added to the checkuser tool itself. Additionally, the 5000 limit can be increased with timeout limit tweaks if really needed.
Why not just write an API module that interfaces with Checkuser?
Yes. Yes, yes, yes, yes, yes :) The request for output formats is more because whatever is done in software, most checkusers will not be using API programs or unusual configurations. They need basic tabular and sortable format, because unlike editing, checkuser work is pure data manipulation (and lots of it) and mediawiki will never provide all that's needed. So a way to export data to a program or format that will, is needed.
What about, above the data you get a few icons to click, that let you download or open the data in 2 or 3 common formats.
Changing description etc per above; let's have an API module.
I've started to work on this. So far, I have the API base file set up, and now I have to calculate the checkuser results.
Created attachment 5676 [details] Patch to add CheckUser API module I have written some bare code enabling this functionality (in other words, it needs review).
Created attachment 5677 [details] Fix copyright Because Reedy wanted me to. :)
A few general comments on this: * You are basically duplicating the CheckUser interface. This is bad because changes in one part of the CheckUser code will not affect other parts. Code duplication should be avoided when possible. * You are constructing raw sql, whereas using the wrapper functions is strongly recommended * Limits are hardcoded * XFF is added as /xff. In the api you will simply want to use a boolean xff parameter
(In reply to comment #11) > Created an attachment (id=5676) [details] > Patch to add CheckUser API module > > I have written some bare code enabling this functionality (in other words, it > needs review). > Here you are: * By convention, class names of API modules start with 'API', so use e.g. ApiCheckUser as class name (and hence ApiCheckUser.php as filename) * 'cantcheckuser' message: "don't" misspelled as "dont" * Parameters: ** Use ApiBase::PARAM_TYPE => 'user' for the user parameter (accepts IPs too) so you won't have to do username normalization ** Use ApiBase::PARAM_TYPE => array('allowed', 'values', 'here') for the type parameter ** If the duration parameter is a timestamp (not clear to me), use ApiBase::PARAM_TYPE => 'timestamp' ** If it's a number of days, use ApiBase::PARAM_TYPE => 'integer' and set ApiBase::PARAM_MIN and ApiBase::PARAM_MAX to appropriate values ** Use ApiBase::PARAM_DFLT => '' for the reason parameter rather than doing if(!isset($params['reason'])) $reason = ''; * You're using regexes to check for IPv4 and IPv6 IP's while you shouldn't: the /xff thing should go (like Bryan said) and the rest will be handled for you when you change the user parameter to be of type 'user' (In reply to comment #13) > A few general comments on this: > * You are basically duplicating the CheckUser interface. This is bad because > changes in one part of the CheckUser code will not affect other parts. Code > duplication should be avoided when possible. Bryan's right. You should call existing functions in CheckUser, not copy them. If CheckUser doesn't have nice functions for your purposes, create them and/or adapt existing ones to be nice to you. > * You are constructing raw sql, whereas using the wrapper functions is strongly > recommended Eek! Looks like that's part of the code copied from CheckUser, though. > * Limits are hardcoded Your module should accept a 'limit' parameter: 'limit' => array ( ApiBase :: PARAM_DFLT => 10, ApiBase :: PARAM_TYPE => 'limit', ApiBase :: PARAM_MIN => 1, ApiBase :: PARAM_MAX => ApiBase :: LIMIT_BIG1, ApiBase :: PARAM_MAX2 => ApiBase :: LIMIT_BIG2 ) > * XFF is added as /xff. In the api you will simply want to use a boolean xff > parameter > 'xff' => false will do that. $params['xff'] will be a simple boolean then, so no need for isset($params['xff']) checks.
> (In reply to comment #13) > > A few general comments on this: > > * You are basically duplicating the CheckUser interface. This is bad because > > changes in one part of the CheckUser code will not affect other parts. Code > > duplication should be avoided when possible. > Bryan's right. You should call existing functions in CheckUser, not copy them. > If CheckUser doesn't have nice functions for your purposes, create them and/or > adapt existing ones to be nice to you. All the functions in the CheckUser class are protected functions, and I can't have the code output them. In addition, they output pure HTML (which is bad in its own right). The entire CheckUser code could use a rewrite.
(In reply to comment #15) > > (In reply to comment #13) > > > A few general comments on this: > > > * You are basically duplicating the CheckUser interface. This is bad because > > > changes in one part of the CheckUser code will not affect other parts. Code > > > duplication should be avoided when possible. > > Bryan's right. You should call existing functions in CheckUser, not copy them. > > If CheckUser doesn't have nice functions for your purposes, create them and/or > > adapt existing ones to be nice to you. > All the functions in the CheckUser class are protected functions, and I can't > have the code output them. In addition, they output pure HTML (which is bad in > its own right). > > The entire CheckUser code could use a rewrite. > No doubt it could. But unless it's extremely API-unfriendly, it shouldn't be too hard to split off the HTML-outputting functions into a backend part (that does the database stuff) and a frontend part (that does the HTML generation), and then just call the backend part from the API module (of course that means you should make the backend functions public). That's the tactic I used when writing most of the write API.
Not quite sure this is still under Api, assigning to CentralAuth
For the curious, I've started work on this again. So far I've got most of the base functions, all of the classes laid out, and the API is successfully doing a User -> IP check.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Is this patch/file any use? What else needs doing?
Created attachment 8572 [details] Patch updated, stylized
(In reply to comment #20) > Is this patch/file any use? > > What else needs doing? Looks like it needs a lot of code issues (raw sql) etc fixing up...
Created attachment 8573 [details] Non raw sql version!
Committed the API module in r88704
Comment on attachment 8573 [details] Non raw sql version! Obsolete as it's supplied
Except for the second one, my comments in #13 were not addressed: > A few general comments on this: > * You are basically duplicating the CheckUser interface. This is bad because > changes in one part of the CheckUser code will not affect other parts. Code > duplication should be avoided when possible. > * You are constructing raw sql, whereas using the wrapper functions is strongly > recommended > * Limits are hardcoded > * XFF is added as /xff. In the api you will simply want to use a boolean xff > parameter Especially with a high sensitivity module such as CU, I really do want to have a common backend for the special page and the API module. With all the code duplication the chances that are high that security vulnerabilities and other bugs that are found in the API module and fixed are not propagated to the special page and vice versa. Considering the sensitivity of CheckUser I do not think that this is acceptable.
The patch is pretty outdated, and I don't even think it works with the current version. I'm planning on finishing this during the summer, when I finally have an abundance of time to work on it.
Reverted the commit of the old patch in r88728; please clean it up and remove code duplication before recommitting. (The module was disabled already in r88719.)
*** Bug 31723 has been marked as a duplicate of this bug. ***
Solved in Bug 31723 *** This bug has been marked as a duplicate of bug 31723 ***