Last modified: 2014-11-14 09:15:04 UTC
When is in interwiki link space betveen namespace and name, bot crashes: pwb.py interwiki -async -family:wiktionary -cleanup -continue ... Retrieving pages from wiktionary:fr. WARNING: loadpageinfo: Query on [[fr:Categorie: Abreviations en italien]] returned data on 'Categorie:Abreviations en italien' Dump cs (wiktionary) written. Traceback (most recent call last): File "D:\Py\rewrite\pwb.py", line 178, in <module> run_python_file(fn, argv, argvu) File "D:\Py\rewrite\pwb.py", line 75, in run_python_file exec(compile(source, filename, "exec"), main_mod.__dict__) File "D:\Py\rewrite\scripts\interwiki.py", line 2646, in <module> main() File "D:\Py\rewrite\scripts\interwiki.py", line 2621, in main bot.run() File "D:\Py\rewrite\scripts\interwiki.py", line 2365, in run self.queryStep() File "D:\Py\rewrite\scripts\interwiki.py", line 2338, in queryStep self.oneQuery() File "D:\Py\rewrite\scripts\interwiki.py", line 2334, in oneQuery subject.batchLoaded(self) File "D:\Py\rewrite\scripts\interwiki.py", line 1305, in batchLoaded if not page.exists(): File "D:\Py\rewrite\pywikibot\page.py", line 564, in exists return self.site.page_exists(self) File "D:\Py\rewrite\pywikibot\site.py", line 2288, in page_exists return page._pageid > 0 AttributeError: 'Page' object has no attribute '_pageid' <type 'exceptions.AttributeError'> CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort Because of impossibility of change dumpfile (https://bugzilla.wikimedia.org/show_bug.cgi?id=72943 ) I modified this page https://cs.wiktionary.org/w/index.php?title=Kategorie:Italské_zkratky&diff=531814&oldid=522026 so if anyone wants to reproduce, must edit another page
The hint here is "Query on [[fr:Categorie: Abreviations en italien]] returned data on 'Categorie:Abreviations en italien'" That is only a warning in _update_page , and it because of Site.sametitle. I set up a test case: https://en.wikipedia.org/wiki/User:John_Vandenberg/test is: fooo [[fr:Catégorie: Pantonyme]] https://pt.wikipedia.org/wiki/Usu%C3%A1rio:John_Vandenberg/test is: fooo [[en:User:John Vandenberg/test]] Then: $ python pwb.py interwiki -page:"Usuário:John_Vandenberg/test" -family:wikipedia -lang:pt NOTE: Number of pages queued is 0, trying to add 50 more. Retrieving 1 pages from wikipedia:pt. [[pt:Usuário(a):John Vandenberg/test]]: [[pt:Usuário(a):John Vandenberg/test]] gives new interwiki [[en:User:John Vandenberg/test]] Retrieving 1 pages from wikipedia:en. WARNING: [[pt:Usuário(a):John Vandenberg/test]] is in namespace 2, but [[fr:Catégorie: Abréviations en italien]] is in namespace 14. Follow it anyway? ([y]es, [n]o, [a]dd an alternative, [g]ive up) y [[pt:Usuário(a):John Vandenberg/test]]: [[en:User:John Vandenberg/test]] gives new interwiki [[fr:Catégorie: Abréviations en italien]] Retrieving 1 pages from wikipedia:fr. WARNING: preloadpages: Query returned unexpected title'Catégorie:Abréviations en italien' WARNING: loadpageinfo: Query on [[fr:Catégorie: Abréviations en italien]] returned data on 'Catégorie:Abréviations en italien' Dump pt (wikipedia) appended. Traceback (most recent call last): File "pwb.py", line 178, in <module> run_python_file(fn, argv, argvu) File "pwb.py", line 75, in run_python_file exec(compile(source, filename, "exec"), main_mod.__dict__) File "scripts/interwiki.py", line 2646, in <module> main() File "scripts/interwiki.py", line 2621, in main bot.run() File "scripts/interwiki.py", line 2365, in run self.queryStep() File "scripts/interwiki.py", line 2338, in queryStep self.oneQuery() File "scripts/interwiki.py", line 2334, in oneQuery subject.batchLoaded(self) File "scripts/interwiki.py", line 1305, in batchLoaded if not page.exists(): File ".../pywikibot/page.py", line 564, in exists return self.site.page_exists(self) File ".../pywikibot/site.py", line 2306, in page_exists return page._pageid > 0 AttributeError: 'Page' object has no attribute '_pageid' <type 'exceptions.AttributeError'> CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort Using this patch fixes the problem for me (so, I'll review it now) https://gerrit.wikimedia.org/r/#/c/151809/ And https://gerrit.wikimedia.org/r/172108/ is also needed because of a recently created bug.
API langlinks data retains the space. https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=User:John%20Vandenberg/test { "query": { "pages": { "40071800": { "pageid": 40071800, "ns": 2, "title": "User:John Vandenberg/test", "langlinks": [ { "lang": "fr", "*": "Cat\u00e9gorie: Pantonyme" } ] } } } } api.py update_page uses pywikibot.Link.langlinkUnsafe to create a Link object, and that doesnt remove spaces. >>> s = pywikibot.Site() >>> l = pywikibot.Link.langlinkUnsafe('fr', 'Catégorie: Pantonyme', source=s) >>> l.title ' Pantonyme'
see laso https://bugzilla.wikimedia.org/show_bug.cgi?id=73415