Last modified: 2014-01-29 17:34:38 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1546/ Reported by: dixond Created on: 2012-11-28 13:00:50 Subject: Page._getVersionHistory returns only a part of a history Assigned to: xqt Original description: There is a bug in Page.\_getVersionHistory. It doesn't load the whole history it it is large. The problem in here \(wikipedia.py\): if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount: thisHistoryDone = True I believe it should be as following: if not getAll and len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) >= revCount: thisHistoryDone = True Version.py: Pywikipedia trunk/pywikipedia/ \(r10745, 2012/11/20, 13:03:05\) Python 2.7.3 \(default, Apr 10 2012, 23:31:26\) \[MSC v.1500 32 bit \(Intel\)\] config-settings: use\_api = True use\_api\_login = True unicode test: ok
- **priority**: 5 --> 8
- **priority**: 8 --> 5
Are you sure that you have set getAll=True while invoking that method?
- **assigned_to**: nobody --> xqt
Yes, of course. It is quite obvious that the following code won't allow to load the rest of revisions by setting thisHistoryDone to True: if len\(result\['query'\]\['pages'\].values\(\)\[0\]\['revisions'\]\) < revCount: thisHistoryDone = True Am I missing anything?
first of all \_getVersionHistory\(\) is an internal method and you shouldn't use it directly. Use getVersionHistory\(\) instead. The the condition is quite right. Try the following statements: import pywikibot as pwb p = pwb.Page\('de', 'user talk:xqt'\) h = p.getVersionHistory\(getAll=True\) len\(h\) which gives 4250 entries \(yet\). Changing the condition will return 500 entries only.
Changing the condition still returns 4250 entries for me \(have you missed the "not getAll and " part in my code?\) But if I use fullVersionHistory instead of getVersionHistory, it returns only 192 entries for me. I.e. try the following code: import wikipedia as pywikibot p = pywikibot.Page\('de', 'user talk:xqt'\) h = p.fullVersionHistory\(getAll=True\) print len\(h\)
Any updates? Are you able to reproduce this issue?
Change 105619 had a related patch set uploaded by Mpaa: (bug 55160) Page._getVersionHistory returns only a part of a history https://gerrit.wikimedia.org/r/105619
(In reply to comment #9) > Change 105619 had a related patch set uploaded by Mpaa: > (bug 55160) Page._getVersionHistory returns only a part of a history > > https://gerrit.wikimedia.org/r/105619 h = p.getVersionHistory(getAll=True) returns the full history. h = p.fullVersionHistory(getAll=True) returns 192 entries (now more ...). Reason is that result might not be 'revCount' long also when 'query-continue' is returned, due to: {u'result':{u'*': u'This result was truncated because it would otherwise be larger than the limit of 12582912 bytes'}} So it is not enough to check only that len() < revCount to declare that thisHistoryDone = True.
Change 105619 merged by jenkins-bot: (bug 55160) Page._getVersionHistory returns only a part of a history https://gerrit.wikimedia.org/r/105619