Last modified: 2014-11-14 09:15:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T75124, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 73124 - additional space in langlinks data causes crash
additional space in langlinks data causes crash
Status: NEW
Product: Pywikibot
Classification: Unclassified
interwiki.py (Other open bugs)
core-(2.0)
All All
: Unprioritized major
: ---
Assigned To: Pywikipedia bugs
:
Depends on:
Blocks: pwb20
  Show dependency treegraph
 
Reported: 2014-11-07 09:00 UTC by JAn Dudík
Modified: 2014-11-14 09:15 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description JAn Dudík 2014-11-07 09:00:15 UTC
When is in interwiki link space betveen namespace and name, bot crashes:

pwb.py interwiki -async -family:wiktionary -cleanup -continue

...
Retrieving pages from wiktionary:fr.
WARNING: loadpageinfo: Query on [[fr:Categorie: Abreviations en italien]] returned data on 'Categorie:Abreviations en italien'
Dump cs (wiktionary) written.
Traceback (most recent call last):
  File "D:\Py\rewrite\pwb.py", line 178, in <module>
    run_python_file(fn, argv, argvu)
  File "D:\Py\rewrite\pwb.py", line 75, in run_python_file
    exec(compile(source, filename, "exec"), main_mod.__dict__)
  File "D:\Py\rewrite\scripts\interwiki.py", line 2646, in <module>
    main()
  File "D:\Py\rewrite\scripts\interwiki.py", line 2621, in main
    bot.run()
  File "D:\Py\rewrite\scripts\interwiki.py", line 2365, in run
    self.queryStep()
  File "D:\Py\rewrite\scripts\interwiki.py", line 2338, in queryStep
    self.oneQuery()
  File "D:\Py\rewrite\scripts\interwiki.py", line 2334, in oneQuery
    subject.batchLoaded(self)
  File "D:\Py\rewrite\scripts\interwiki.py", line 1305, in batchLoaded
    if not page.exists():
  File "D:\Py\rewrite\pywikibot\page.py", line 564, in exists
    return self.site.page_exists(self)
  File "D:\Py\rewrite\pywikibot\site.py", line 2288, in page_exists
    return page._pageid > 0
AttributeError: 'Page' object has no attribute '_pageid'
<type 'exceptions.AttributeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort


Because of impossibility of change dumpfile (https://bugzilla.wikimedia.org/show_bug.cgi?id=72943 )
I modified this page
https://cs.wiktionary.org/w/index.php?title=Kategorie:Italské_zkratky&diff=531814&oldid=522026

so if anyone wants to reproduce, must edit another page
Comment 1 John Mark Vandenberg 2014-11-09 14:30:06 UTC
The hint here is "Query on [[fr:Categorie: Abreviations en italien]] returned data on 'Categorie:Abreviations en italien'"  That is only a warning in _update_page , and it because of Site.sametitle.

I set up a test case:

https://en.wikipedia.org/wiki/User:John_Vandenberg/test is:

fooo

[[fr:Catégorie: Pantonyme]]

https://pt.wikipedia.org/wiki/Usu%C3%A1rio:John_Vandenberg/test is:

fooo

[[en:User:John Vandenberg/test]]

Then:

$ python pwb.py interwiki -page:"Usuário:John_Vandenberg/test" -family:wikipedia -lang:pt
NOTE: Number of pages queued is 0, trying to add 50 more.
Retrieving 1 pages from wikipedia:pt.
[[pt:Usuário(a):John Vandenberg/test]]: [[pt:Usuário(a):John Vandenberg/test]] gives new interwiki [[en:User:John Vandenberg/test]]
Retrieving 1 pages from wikipedia:en.
WARNING: [[pt:Usuário(a):John Vandenberg/test]] is in namespace 2, but [[fr:Catégorie: Abréviations en italien]] is in namespace 14. Follow it anyway? ([y]es, [n]o, [a]dd an alternative, [g]ive up) y
[[pt:Usuário(a):John Vandenberg/test]]: [[en:User:John Vandenberg/test]] gives new interwiki [[fr:Catégorie: Abréviations en italien]]
Retrieving 1 pages from wikipedia:fr.
WARNING: preloadpages: Query returned unexpected title'Catégorie:Abréviations en italien'
WARNING: loadpageinfo: Query on [[fr:Catégorie: Abréviations en italien]] returned data on 'Catégorie:Abréviations en italien'
Dump pt (wikipedia) appended.
Traceback (most recent call last):
  File "pwb.py", line 178, in <module>
    run_python_file(fn, argv, argvu)
  File "pwb.py", line 75, in run_python_file
    exec(compile(source, filename, "exec"), main_mod.__dict__)
  File "scripts/interwiki.py", line 2646, in <module>
    main()
  File "scripts/interwiki.py", line 2621, in main
    bot.run()
  File "scripts/interwiki.py", line 2365, in run
    self.queryStep()
  File "scripts/interwiki.py", line 2338, in queryStep
    self.oneQuery()
  File "scripts/interwiki.py", line 2334, in oneQuery
    subject.batchLoaded(self)
  File "scripts/interwiki.py", line 1305, in batchLoaded
    if not page.exists():
  File ".../pywikibot/page.py", line 564, in exists
    return self.site.page_exists(self)
  File ".../pywikibot/site.py", line 2306, in page_exists
    return page._pageid > 0
AttributeError: 'Page' object has no attribute '_pageid'
<type 'exceptions.AttributeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Using this patch fixes the problem for me (so, I'll review it now)

https://gerrit.wikimedia.org/r/#/c/151809/

And https://gerrit.wikimedia.org/r/172108/ is also needed because of a recently created bug.
Comment 2 John Mark Vandenberg 2014-11-09 16:55:33 UTC
API langlinks data retains the space.

https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=User:John%20Vandenberg/test

{
    "query": {
        "pages": {
            "40071800": {
                "pageid": 40071800,
                "ns": 2,
                "title": "User:John Vandenberg/test",
                "langlinks": [
                    {
                        "lang": "fr",
                        "*": "Cat\u00e9gorie: Pantonyme"
                    }
                ]
            }
        }
    }
}

api.py update_page uses pywikibot.Link.langlinkUnsafe to create a Link object, and that doesnt remove spaces.

>>> s = pywikibot.Site()
>>> l = pywikibot.Link.langlinkUnsafe('fr', 'Catégorie: Pantonyme', source=s)
>>> l.title
' Pantonyme'
Comment 3 JAn Dudík 2014-11-14 09:15:04 UTC
see laso
https://bugzilla.wikimedia.org/show_bug.cgi?id=73415

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links