Last modified: 2014-11-10 07:58:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61678, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59678 - Implement badtoken detection and recovery
Implement badtoken detection and recovery
Status: NEW
Product: Pywikibot
Classification: Unclassified
General (Other open bugs)
core-(2.0)
All All
: High major
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks: Wikisource pwb20
  Show dependency treegraph
 
Reported: 2014-01-05 14:00 UTC by Maarten Dammers
Modified: 2014-11-10 07:58 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2014-01-05 14:00:48 UTC
Every once in a while I get a badtoken exception. This is probably because I have multiple bots running on the same site at the same time (race condition).
* Bot A requests token -> 123
* Bot B requests token -> 123
* Bot A edits with token 123 -> ok
* Bot B edits with token 123 -> poof

We could of course implement very difficult synchronization, but it doesn't happen very often so it's probably better handle it like a collision in ethernet.
* Detect the badtoken
* Back off for a random number of seconds
* Get a new token
* Do the edit
Max tries should be respected so the bot can't get into a infinite retry loop.
Comment 1 Maarten Dammers 2014-01-05 14:25:07 UTC
Example:

badtoken
* '''Sorry! We could not process your edit due to a loss of session data.'''
Please try again.
If it still does not work, try [[Special:UserLogout|logging out]] and logging back in.
* There seems to be a problem with your login session;
this action has been canceled as a precaution against session hijacking.
Go back to the previous page, reload that page and then try again.

{u'messages': {u'1': {u'type': u'error', u'name': u'sessionfailure'}, u'0': {u't
ype': u'error', u'name': u'session_fail_preview'}, u'html': {u'*': u'<ul>\n<li>
<b>Sorry! We could not process your edit due to a loss of session data.</b>\n</l
i>\n</ul>\n<p>Please try again.\nIf it still does not work, try <a href="/wiki/S
pecial:UserLogout" title="Special:UserLogout">logging out</a> and logging back i
n.\n</p>\n<ul>\n<li> There seems to be a problem with your login session;\n</li>
\n</ul>\n<p>this action has been canceled as a precaution against session hijack
ing.\nGo back to the previous page, reload that page and then try again.\n</p>'}
}}
Comment 2 Merlijn van Deen (test) 2014-01-05 14:27:08 UTC
OK, so this is slightly more complicated than it seems.

There are two obvious methods:
 - handle the BadToken error in data/api.py. We can just self.sleep() and then get a new edit token
 - handle the BadToken error in data/page.py, in editpage()

Both options have their problems.

data/api.py:
 good: we can also handle other types of token problems
 bad: edit tokens also serve to detect edit conflicts, and we cannot handle those at the data/api.py level...

data/page.py:
 good: the logic for getting tokens & handling edit conflicts is already here!
 bad: the retry logic is in the data/api.py layer, and it doesn't cover other token issues
Comment 3 Maarten Dammers 2014-08-17 11:43:23 UTC
Wikidata is very unstable today so I keep running into:

  File "C:\pywikibot\coredev\pywikibot\data\api.py", line 458, in submit
    raise APIError(code, info, **result["error"])
pywikibot.data.api.APIError: badtoken: <strong>Sorry! We could not process your
edit due to a loss of session data.</strong>
Please try again.
If it still does not work, try [[Special:UserLogout|logging out]] and logging ba
ck in.
<class 'pywikibot.data.api.APIError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Marking this as a bug. The bot shouldn't crash on this.
Comment 4 John Mark Vandenberg 2014-08-19 07:21:39 UTC
We have a changeset pending to overhaul token management in site.py
https://gerrit.wikimedia.org/r/#/c/139372/

It adds caching of tokens so, with badtoken now appearing more regularly, the cache needs better management of how long these tokens are useful for.
Comment 5 Amir Ladsgroup 2014-08-19 08:01:23 UTC
> data/api.py:
>  good: we can also handle other types of token problems
>  bad: edit tokens also serve to detect edit conflicts, and we cannot handle
> those at the data/api.py level...

Maybe I'm wrong but edit conflict hasn't been detected by tokens, it has been detected by basetimestamp in mediawiki [1] and if edit conflict happens it raises editconflict error not badtoken error. See the error table.
[1]: https://www.mediawiki.org/wiki/API:Edit

If we want to avoid undetected edit conflicts the only thing we need to do is adding basetimestamp to action=edit api calls.
Comment 6 Merlijn van Deen (test) 2014-08-19 08:10:51 UTC
(In reply to John Mark Vandenberg from comment #4)
> We have a changeset pending to overhaul token management in site.py
> https://gerrit.wikimedia.org/r/#/c/139372/
> 
> It adds caching of tokens so, with badtoken now appearing more regularly,
> the cache needs better management of how long these tokens are useful for.

The entire problem is /caching/ the tokens. They are not valid for a fixed time, they are valid of /one edit/. Basically it's a race condition, so there's two options:

1) the 'nice' way: implement locking. Requires some sort of interprocess communication,

2) the 'hacky' way: reduce the prevalence of the condition (by reducing the time between getting a token and using it), and retrying -- effectively using the remote MW instance as lock.

(In reply to Amir Ladsgroup from comment #5)
> Maybe I'm wrong but edit conflict hasn't been detected by tokens, it has
> been detected by basetimestamp in mediawiki [1] and if edit conflict happens
> it raises editconflict error not badtoken error. See the error table.
> [1]: https://www.mediawiki.org/wiki/API:Edit

Yes, you are right. So we can just implement this at the api.php level.
Comment 7 Ricordisamoa 2014-08-19 09:03:10 UTC
(In reply to Merlijn van Deen from comment #6)
> The entire problem is /caching/ the tokens. They are not valid for a fixed
> time, they are valid of /one edit/.

https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2014-August/000063.html

«All tokens may be cached as long as the session is valid; none are
dependent on factors such as the page being edited or the user being
targeted.»

And some of them are always the same (e.g. editToken & protectToken). They will be merged with the change announced above.

However, since we want to be able to work with multiple account on the same wiki, we need better caching.
Comment 8 John Mark Vandenberg 2014-09-15 09:55:05 UTC
There is another patch going through review, which will help organise the framework for this.

https://gerrit.wikimedia.org/r/#/c/159394/
Comment 9 John Mark Vandenberg 2014-10-31 01:14:22 UTC
Here is a related MW changeset to time limit the tokens
https://gerrit.wikimedia.org/r/#/c/156336/
Comment 10 555 2014-11-01 22:24:31 UTC
Adding bug 35925 to proper track this issue (is causing issues in a Wikisource specific gadget. More on the issue: <https://fr.wikisource.org/w/index.php?oldid=4780982#Match_.26_Split>. Further info related to the gadget: <https://en.wikisource.org/wiki/Help:Match_and_split>)
Comment 11 John Mark Vandenberg 2014-11-10 07:58:57 UTC
This (today) is the first time I have remembered it appearing in travis builds:
https://travis-ci.org/wikimedia/pywikibot-core/jobs/40487338

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links