Last modified: 2014-10-20 15:26:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T57204, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 55204 - [[en:Blah|Blah]] isn't an interwiki
[[en:Blah|Blah]] isn't an interwiki
Status: NEW
Product: Pywikibot
Classification: Unclassified
interwiki.py (Other open bugs)
unspecified
All All
: Normal major
: ---
Assigned To: Amir Ladsgroup
: testme
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-05 04:43 UTC by Kunal Mehta (Legoktm)
Modified: 2014-10-20 15:26 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kunal Mehta (Legoktm) 2013-10-05 04:43:25 UTC
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1427/
Reported by: yfdyh000
Created on: 2012-03-30 18:49:19
Subject: Link identify errors
Original description:
Script mistakenly identify interwiki link, see:
http://en.wikipedia.org/wiki/User\_talk:YFdyh-bot
http://en.wikipedia.org/w/index.php?title=Template%3ANon-free\_video\_sample&diff=483600197&oldid=476118424

At the time I run the command: 
2012-03-24 04:32:09 r10024 \(wikipedia.py\) Python 2.7.2 interwiki.py "-warnfile:logs\warning-wikipedia-en.log" "-lang:en" "-cleanup" "-autonomous" "-async"
Comment 1 Kunal Mehta (Legoktm) 2013-10-05 04:43:27 UTC
The problem here is that someone used \[\[en:blah\]\] to link somewhere instead of \[\[:en:blah\]\]. This one could have been spotted because \[\[en:blah|fdafdsa\]\] never is an interwiki link.
Comment 2 Ricordisamoa 2014-08-05 03:56:14 UTC
texlib.getLanguageLinks() has always caught pipe characters, and https://www.mediawiki.org/wiki/Special:Code/pywikipedia/43 made clear that the first part should have been ignored.

Adding '\|' to the regex would solve the problem for interwiki.py, but I'm afraid this would break other scripts. So, maybe is it safer to add an optional argument to avoid catching piped links?

I am tempted by raising Severity...
Comment 3 John Mark Vandenberg 2014-10-20 05:21:27 UTC
In addition to the '|' being an indicator of non-interwiki-ness, the interwiki map may also be used to help solve this specific case, as en: and w: are never interwiki links and they are marked as localinterwiki="" in the interwikimap.

Fabian has updated Site.interwiki() and Link.parse(), so this bug in interwiki.py might be automatically solved, but it would be good to inspect/debug interwiki.py to confirm this.
Comment 4 Fabian 2014-10-20 11:05:08 UTC
Link.parse() doesn't know if it's a piped link. A link like [[en:Blah|Blah]] is treated like [[en:Blah]] or [[:en:Blah]]: all link to Blah on en (whatever that is linked in the interwiki map). In fact Link doesn't store if it's an interwiki link and just returns the Site (which might be to a different site).

Site.interwiki() also doesn't take "localinterwiki" into account because it doesn't make a difference as based on the URL, it should return the Site itself anyway.

It sounds to me that before that it shouldn't recognize [[X:Y|Z]] as an link to be parsed anyway.
Comment 5 Fabian 2014-10-20 15:02:19 UTC
Okay with a little help from John, I think I know what the problem is, but I'm not sure how to tackle it the best.

Basically if the first interwikiprefix is not preceeded by a colon, it appears on the sidebar only if it's not linking to it's own site. So [[de:en:Foobar]] appears in the sidebar on the English Wikipedia but [[en:Foobar]] not.

Now I'm toying with the idea to add a "is_special" method to Link which says whether it's just a link or if it's "special". So [[:File:Foobar.png]] would be not special but [[File:Foobar.png]]. Same with [[:Category:Foobar]] and [[Category:Foobar]] and then with [[:interwiki:Foobar]] and [[interwiki:Foobar]] with the special exception that it's not an special link if the frist interwiki points to it's own site. I don't think we need to interpret "localinterwiki" for that, and could add that later independently to add a shortcut in Site.interwiki(), as I'm not sure when this was added (and if this is determined automatically it doesn't really matter for this bug).
Comment 6 John Mark Vandenberg 2014-10-20 15:26:29 UTC
(In reply to Fabian from comment #5)
> .. to add a "is_special"

There are a few different types of 'special' - it would be good to have different names for each of them.

is_langlink (i.e. sidebar interlanguage links; git grep langlink shows Link already has some functionality about these)
is_transclude (for [[File:Foobar.png]])
is_category

is_metadata = is_langlink or is_category

is_special = is_langlink or is_category or is_transclude

Another type that comes to mind is [[/example/]] , which is not special in the same sense as the above, but it does cause many problems (i.e. during page moves)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links