Last modified: 2014-11-14 18:40:52 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1470/ Reported by: xqt Created on: 2012-06-21 13:44:27 Subject: Rewrite Performance (multiple API request) Original description: There are multiple user info queries which slows down the performance: c:\Pywikipedia\rw>pwb.py basic.py user:xqt/Test -simulate -v Pywikipediabot r10326 2012-06-08 12:08:53Z Python 2.7.3 \(default, Apr 10 2012, 23:24:47\) \[MSC v.1500 64 bit \(AMD64\)\] Retrieving 1 pages from wikipedia:de. Starting 1 threads... API action query: userinfo Found 1 wikipedia:de processes running, including this one. >>> Benutzer:Xqt/Test <<< \- Test \+ Test Test Comment: Bot: Ändere ... Do you want to accept these changes? \(\[y\]es, \[N\]o\) y API action query: userinfo API action query: userinfo Cosmetic changes for wikipedia-de enabled. API action query: siteinfo|userinfo API action query: userinfo API action edit: SIMULATION: edit action blocked. Page \[\[Benutzer:Xqt/Test\]\] saved without any changes. Page \[\[Benutzer:Xqt/Test\]\] saved Dropped throttle\(s\). Waiting for threads to finish... All threads finished. Dropped throttle\(s\). c:\Pywikipedia\rw>
These are muliple API requests and I guess a lot of them could be cached by a site instance or on disk. This and other code parts decreases the performance of pwb 2.0 by 30% \(or increases the process by 50%\) meassured with touch.py -start:\! -pt:0
- **assigned_to**: russblau --> nobody - **summary**: Multiple user info request --> Rewrite Performance (multiple API request)
Im not sure how the code looked before about April 2014 .. so my comment are unrelated to how the code looked when this bug was raised in 2012. Since at least 2014, userinfo is added to every query, and the response is used to determine whether the server has a different username than pywikibot expects. This occurs in usual usage for two reasons: 1. the bot starts logged out, but with the cookies sent, the server may reply with a username, in which case the server considers the bot logged in. So pywikibot changes the login status of the APISite accordingly. 2. the server invalidates the bot's session, or maybe even credentials e.g. when we had a forced password reset. So there are many API requests and responses with a small chunk of extra data. This could be removed/reduced, with a lot of pain, and little gain. There are also many times where the code base sends the exact same userinfo+siteinfo request several times, because the login code is a mess. However, these are cached locally on disk - which is still a performance problem as this requires disk IO for a tiny chunk of data that the code has already parsed and discarded. I fixed a few of these reload scenarios back in July/August, but it is not fun fiddling with the login/relogin sequence. IMO we should wait until we've released a stable version of 2.0, and then redesign the user/login system, removing the two user system that is heavily embedded in the current codebase. That will probably require a breaking change for sysop-bots, but bot-bots should be unaffected.