Last modified: 2014-10-17 03:43:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 39936 - Limit on number of piped values in API should throw machine-readable warning
Limit on number of piped values in API should throw machine-readable warning
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: api-2.0
  Show dependency treegraph
 
Reported: 2012-09-03 12:39 UTC by jeblad
Modified: 2014-10-17 03:43 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description jeblad 2012-09-03 12:39:02 UTC
There is a limit on number of piped values (arrays) in the API.
Comment 1 denny vrandecic 2012-09-06 09:53:58 UTC
This is about piped values in the request parameters. There is a limit on them (by default, 50 for normal users) which is fine. But if there are more than the limit number of items, the rest gets cut off silently and the action is still performed with the cut off parameter list. There should be an error message instead.
Comment 2 Umherirrender 2012-09-06 18:47:37 UTC
(In reply to comment #0)
> There is a limit on number of piped values (arrays) in the API.

You can try to get the apihighlimit right to get a limit of 500.

(In reply to comment #1)
> This is about piped values in the request parameters. There is a limit on them
> (by default, 50 for normal users) which is fine. But if there are more than the
> limit number of items, the rest gets cut off silently and the action is still
> performed with the cut off parameter list. There should be an error message
> instead.

Check the warnings:

  <warnings>
    <query xml:space="preserve">Too many values supplied for parameter 'titles': the limit is 50</query>
  </warnings>
Comment 3 Bergi 2012-09-24 19:12:29 UTC
I guess the warning is enough and marks the "gets cut off silently" as invalid.

However, it would be nice if that limit could be queried somehow, you only have to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.
Comment 4 Sam Reed (reedy) 2012-09-24 19:46:53 UTC
(In reply to comment #3)
> I guess the warning is enough and marks the "gets cut off silently" as invalid.
> 
> However, it would be nice if that limit could be queried somehow, you only have
> to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.

Not the best example...

https://en.wikipedia.org/w/api.php?action=paraminfo&modules=parse

<param name="prop" description="Which pieces of information to get&#10; text           - Gives the parsed text of the wikitext&#10; langlinks      - Gives the language links in the parsed wikitext&#10; categories     - Gives the categories in the parsed wikitext&#10; categorieshtml - Gives the HTML version of the categories&#10; languageshtml  - Gives the HTML version of the language links&#10; links          - Gives the internal links in the parsed wikitext&#10; templates      - Gives the templates in the parsed wikitext&#10; images         - Gives the images in the parsed wikitext&#10; externallinks  - Gives the external links in the parsed wikitext&#10; sections       - Gives the sections in the parsed wikitext&#10; revid          - Adds the revision ID of the parsed page&#10; displaytitle   - Adds the title of the parsed wikitext&#10; headitems      - Gives items to put in the &lt;head&gt; of the page&#10; headhtml       - Gives parsed &lt;head&gt; of the page&#10; iwlinks        - Gives interwiki links in the parsed wikitext&#10; wikitext       - Gives the original wikitext that was parsed&#10; properties     - Gives various properties defined in the parsed wikitext" default="text|langlinks|categories|links|templates|images|externallinks|sections|revid|displaytitle|iwlinks|properties" multi="" limit="500" lowlimit="50" highlimit="500">
            <type>
              <t>text</t>
              <t>langlinks</t>
              <t>languageshtml</t>
              <t>categories</t>
              <t>categorieshtml</t>
              <t>links</t>
              <t>templates</t>
              <t>images</t>
              <t>externallinks</t>
              <t>sections</t>
              <t>revid</t>
              <t>displaytitle</t>
              <t>headitems</t>
              <t>headhtml</t>
              <t>iwlinks</t>
              <t>wikitext</t>
              <t>properties</t>
            </type>
          </param>
Comment 5 jeblad 2012-09-24 23:02:48 UTC
I'm not sure if I would mark this as resolved, as I believe what the API does now are wrong. If some limitations exists that makes the API unable to fulfill its contract with the requester, then the API should fail. If the requester ask for an operation that imply a list of 50+ values and the API can only handle 50 of them, well then the request should fail. Only if the requester includes some parameter that says it is ok with a partial result should the API attempt to continue with truncated lists. something like "partial=continue" that defalts to "partial=fail".

In our case where we observed this we had lists of aliases and someone could create lists of 50+ aliases which would then be truncated when someone with lesser rights edited the same aliases. This is now changed so anyone with a limit of 50 values can only change up to 50 at a time. This kind of work but creates weird errors during bulk upload.

There is at least one way we can get around this, but I tend to think that this is an issue that should be fixed in core.
Comment 6 jeblad 2012-09-24 23:05:16 UTC
Perhaps the most obvious argument isn't clear from the previous. We need an error report before data gets corrupted, not after the API has corrupted the data due to truncated lists.
Comment 7 Daniel Kinzler 2012-09-25 08:34:39 UTC
(In reply to comment #3)
> I guess the warning is enough and marks the "gets cut off silently" as invalid.
> 
> However, it would be nice if that limit could be queried somehow, you only have
> to guess from your userrights.contains("apihighlimit") whether it is 50 or 500.

sadly, warnings don't use machine readable codes. So warnings are only useful for humans, not for client bots.
Comment 8 jeblad 2012-09-25 08:51:33 UTC
I guess we could add a flag to mw.config if userrights.contains("apihighlimit") and then limit the requests accordingly. If the UI tries to build a request that will end up truncated it could intercept its own request and warn the user before sending it to the server.
Comment 9 Bergi 2012-09-25 17:57:38 UTC
OK, I agree that the non-machine-readable warning is a bug.

However I don't think we need an extra parameter for the api to throw an error instead of truncating when there are too many values.

As we've seen, you can query the api (userinfo, paraminfo) for your current limit and split the request if needed into chunks of the correct size.
Even if you don't want to do that, you can count the results you got and continue requesting if got over the limit - I did that once when getting paraminfo for *all* querymodules :-)
Comment 10 Daniel Kinzler 2012-09-26 08:39:22 UTC
(In reply to comment #9)
> However I don't think we need an extra parameter for the api to throw an error
> instead of truncating when there are too many values.

Right: it should *always* throw an error.

Not that this issue is far more severe for queries that update data - in that case, it causes silent data loss. That's what caused john to file the initial report.
 
> As we've seen, you can query the api (userinfo, paraminfo) for your current
> limit and split the request if needed into chunks of the correct size.

For that to work, the client needs a way to even determine that limit. I don't think there's currently any reliable way to do that.

> Even if you don't want to do that, you can count the results you got and
> continue requesting if got over the limit - I did that once when getting
> paraminfo for *all* querymodules :-)

So... that would basically be a paging mechanism. Orthogonal to the existing paging mechanism. I think that can get pretty confusing...
Comment 11 Bergi 2012-09-26 13:22:28 UTC
(In reply to comment #10)
> Right: it should *always* throw an error.

A warning is enogh imho.
 
> Not that this issue is far more severe for queries that update data - in that
> case, it causes silent data loss. That's what caused john to file the initial
> report.

Which query (in terms of http://www.mediawiki.org/wiki/API:Query) does update data?
Or do we really encounter too long piped-lists in any of the update modules (http://www.mediawiki.org/wiki/API:Changing_wiki_content)? That might be worth an error, yes.

> > Even if you don't want to do that, you can count the results you got and
> > continue requesting if got over the limit - I did that once when getting
> > paraminfo for *all* querymodules :-)
> 
> So... that would basically be a paging mechanism. Orthogonal to the existing
> paging mechanism. I think that can get pretty confusing...

Oh, we already have that three-dimensional paging mechanism for queries (generators, properties, and big page sets) :-) But yes, it *is* confusing and not easy to implement.
However, a kind of query-continue value in the result (instead of error/warning?) which contains the part of the list that was not handled yet (to continue with) would be very nice.
Comment 12 jeblad 2012-09-26 16:37:54 UTC
> > Even if you don't want to do that, you can count the results you got and
> > continue requesting if got over the limit - I did that once when getting
> > paraminfo for *all* querymodules :-)

This isn't about the query modules, its about core functionality in the API which has implications on other use of it, that is the Wikidata project. In this project we set, get and manipulates data and if we use lists for some of the operations then they will be truncated and still passed on. That creates silent data loss.

In some cases we can code around this, but in other cases its not possible. For the UI we can pass on the limits in mw.config but for bots it gets very cumbersome.

It would be far better to throw an error if the request is out of limit and there is no way to survive a data loss. Perhaps this could be marked in the getAllowedParams as "changes to this list is fatal".

'values' => array(
	ApiBase::PARAM_TYPE => 'string',
	ApiBase::PARAM_ISMULTI => true,
	ApiBase::PARAM_NOTRUNC => true,
),
Comment 13 Roan Kattouw 2012-10-02 02:24:56 UTC
(In reply to comment #12)
> It would be far better to throw an error if the request is out of limit and
> there is no way to survive a data loss. Perhaps this could be marked in the
> getAllowedParams as "changes to this list is fatal".
> 
> 'values' => array(
>     ApiBase::PARAM_TYPE => 'string',
>     ApiBase::PARAM_ISMULTI => true,
>     ApiBase::PARAM_NOTRUNC => true,
> ),
That seems reasonable to me. Returning less data than requested in action=query because something got cut off is fine (and that behavior should be kept), but if you have a module where truncation would be catastrophic, then telling the validator to error out rather than truncating is a good idea.
Comment 14 Yuri Astrakhan 2013-03-04 06:15:48 UTC
API 2.0 tracking - http://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links