Last modified: 2014-11-17 09:21:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T2011, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 11 - Red interwiki links -- check for page existence across wikis
Red interwiki links -- check for page existence across wikis
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Lowest enhancement with 28 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: crosswiki
: 222 (view as bug list)
Depends on: 20646 15609
Blocks: 10237 2934 22001
  Show dependency treegraph
 
Reported: 2004-08-10 17:19 UTC by xmlizer
Modified: 2014-11-17 09:21 UTC (History)
31 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description xmlizer 2004-08-10 17:19:08 UTC
it is important to have information about like we do out of the current
wikimedia instance.

As far as i know, they are on the same database so it *not* technically infeasible

It is especially necessary for wiktionnary
Comment 1 Antoine "hashar" Musso (WMF) 2004-08-14 17:49:13 UTC
We have currently no way to know if an article exist on an other wiki. The
easiest choice is to don't show any link.

Moving as an enhancement request.
Doesn't block #17
Comment 2 Brion Vibber 2004-08-27 01:15:29 UTC
*** Bug 222 has been marked as a duplicate of this bug. ***
Comment 3 SJ 2004-08-27 01:22:46 UTC
Changing summary from "show interwiki and link to wikibooks and wiktionary in
different color if they do not exist", to highlight general nature of the
problem.  I think the most common use of such an interwiki check would be to
help correct broken links 
to other language-versions of a project, and broken links to/from meta.

For the specific case of the wiktionary links shown on a "this article does not
exist" page, we could keep a list of {title, project} pairs for all extant
wikiprojects in a given language, and only show a "you might want to check
related articles on other projects:" 
message when titles *do* exist on other projects.
Comment 4 Brian Jason Drake 2005-11-10 07:46:40 UTC
See bug 3917.
Comment 5 Brian Jason Drake 2005-11-10 07:47:48 UTC
Also see bug 2463 - if the same article exists on 
another wiki, but is in a different language, perhaps 
we should automatically translate and display it.
Comment 6 Rob Church 2005-11-10 12:11:52 UTC
How do you propose to automatically translate the stuff? And how do you know
what article is the same one? Article titles aren't English in non-English
wikis, after all - so how do we set about determining what wikis have our
article? And if we could; what would happen once if we had two or more wikis
with the same one? How could the software tell which to translate?
Comment 7 Chris Wood 2005-11-10 20:39:53 UTC
I think the proposal is just to show if there happens to be an article with the
same name in another wiki.
Comment 8 Jamie Hari 2005-11-11 01:52:40 UTC
The proposal as I understand it, pertains to the following code:

[[w:Example_Article]]

Which would be red if Example_Article does NOT exist on Wikipedia 
OR
would be blue if Example_Article DOES exist on Wikipedia.

Simply for interwiki linking, no translation at all.

This would cause HUGE cross-site sql quierying and suck up untold amounts of 
bandwidth from the sender and receiver. Although this would be FANTASTIC for my 
dual-database setup, I think this may be a pipe-dream.

I could possibly see it paired with some sort of caching system which queries 
once a week and stores the link-state (red or blue) locally until next updated. 
A lot of work, but could encourage a flurry of edits across several sites.

What do you guys think?
Comment 9 Chris Wood 2005-11-11 02:51:47 UTC
Once a week is a lot better than nothing. But as you said, a lot of work.
Comment 10 Jamie Hari 2005-11-11 04:51:12 UTC
Upon further thought, even a caching system with weekly/monthly queries would be 
a heavy load and would lead to security questions like the ability for one db's 
rights to query another (if not all on the same database). An alternate thought 
I have is a negative-option link table in the database, which assumes all links 
as non-existant red-linked pages until someone clicks it at which time it 
queries the other database and if necessary updates the local cache of 'existing 
pages'.

Just tossing out ideas here... Again, my ultimate thought here is 'pipe-dream'. 
One other scenario could see some sort of function-call passing a boolean result 
back in the url of the resulting page on the 2nd database. (Which is uglier than 
Steve Buscemi...)

Any one have any other thoughts or should we kill this?
Comment 11 ssd 2006-01-07 01:28:11 UTC
Another option would be to create an inter-wiki protocol (perhaps http based to
make it simple) that allows one wiki to query another to ask if the page exists,
and then cache it.   This poll could be done (as you suggested) only when a user
follows the link.  Doing this at the http level would remove the need to break
security and query the other database, at the expense of a small penalty (double
page load -- one for the interwiki query, one for the user's web browser).
Comment 12 Jamie Hari 2007-09-18 22:34:50 UTC
How about a ?action=pageexists function which outputs a raw 'true' or 'false' which could be retrieved via an HTTP GET.
Similar in fashion to the way that Special:Statistics does it:

http://en.wikipedia.org/w/index.php?title=Special:Statistics&action=raw

Should be light on both websites, could be cached for a period and invalidated thereafter.
Even lighter if instead of true/false it were 1/0. ;) 
Every byte counts!

Of course, this would be 'off' by default on both sides.
The 'client' wiki would have to turn on $wgDoCrossWikiChecks=TRUE (to do the checking)
The 'server' wiki would have to enable $wgAllowCrossWikiChecks=TRUE (to provide the raw ?action=pageexists output)

If both aren't enabled, it won't work. (Handled gracefully, of course...)
This allows the greatest flexibility and control.


NB: I am changing the summary to reflect the fact that this isn't just for across Wikimedia projects, but rather across any two MediaWiki installations, that support this function.
Comment 13 Rob Church 2007-09-18 23:04:54 UTC
Previous discussions have favoured some sort of API-based check, although in cases where the foreign database is directly readable (such as in the Wikimedia, Wikia, etc. cases), fetching the information straight out of that is preferable.
Comment 14 SJ 2007-12-06 07:07:14 UTC
It would be quite useful to have an api on the calling wiki's side that says "please update this link with data about the target" if that is possible -- that could even allow for checking on existence or status of a page on an arbitrary site (say, linking to a bugzilla bug, and getting a different display of the linktext based on the bug's status... based on proper use of a similar API on the target site).
Comment 15 Robert Leverington 2007-12-23 14:01:28 UTC
There appears to be a two way solution to this problem since some interwiki links will be local, and others remote. To adaquetley accomodate both types a mixture of HTTP access and SQL access would have to be involved. To decide which one will be used the easiest solution would be to add two extra columns to the interwiki table, one for the database the wiki is on, and one for that wikis database table prefix (if applicable) - they would be optional.

When an interwiki to a wiki with a database listed is made an SQL query is made to the targets database that will decide whether or not it is red or blue, this could be cached (see explanation below).

If it is a remote wiki then a call to the api could be made (another database column would be required for the path to the api), e.g. http://en.wikipedia.org/w/api.php?action=query&titles=pagename - if it returns <page missing="" /> then it is missing, otherwise it is not. Currently MediaWiki returns 200 status codes for non-existant pages, so just going to the page would not be reliable. Caching would also be essential with this method.

Caching would involve an extra column in the pagelinks table indicating wether a page is a red link or not - this could be periodically updated by a mainteance script or when the page is purged (but not when edited as this could generate too much traffic).

Comments etc are appreciated and I will consider working on this in the new year if a flaw in my solution is not found.
Comment 16 Matt Johnston 2008-09-26 07:01:26 UTC
I'm working on this bug at the moment, using a (hopefully) extensible system for both remote (API) and local (DB) sites. Not sure on an ETA, although i'll be merging in changes once sections get done.
Comment 17 Sumana Harihareswara 2011-07-22 13:47:45 UTC
Matt, is your code available for us to look at? Or perhaps you've put this project aside?  Per bug 20646 this may depend on the interwiki table.
Comment 18 Matt Johnston 2011-08-28 22:45:27 UTC
I have an old implementation of most of this at http://svn.wikimedia.org/svnroot/mediawiki/branches/remotesite/ - I'm happy to bring it up to HEAD and fix in the missing bits if there's still interest, and people think this is the right approach.
(cross-posted to bug 20646 as this would address both bugs)
Comment 19 Jan Kucera (Kozuch) 2011-12-30 15:50:27 UTC
Because of votes rasing importance/priority according to following scheme:
15+ votes - highest
5-15 votes - high
Community must have a voice within development.

Regards, Kozuch
http://en.wikipedia.org/wiki/User:Kozuch
Comment 20 John Mark Vandenberg 2013-09-20 01:41:50 UTC
In case anyone else runs into this, ...

While {{#ifexists:file:...}} doesnt work for files hosted on Wikimedia Commons..

It is possible to use {{#ifexists:media:...}} on WMF projects, and it does accurately determine whether the media exists on Wikimedia Commons.  There is at least one bug: bug 32031 about combining ifexists media: with file redirects.

I havent tested this with InstantCommons.
Comment 21 Minh Nguyễn 2013-09-20 09:47:28 UTC
Yes, it does work on third-party wikis. On the OpenStreetMap Wiki, {{#ifexist: Media:Wikivoyage-logo.svg | yes | no }} returns “yes”.
Comment 22 Al-Scandar Solstag 2014-04-10 17:09:22 UTC
So, first this is set to "highest" from some reasonable scheme, then some bot reduces it to "low" without any explanation, and now it's again arbitrarily set to "lowest". What about taking input seriously?

This bug represents an important barrier for collaboration and coordination between wikimedia projects. Thank you.
Comment 23 Quim Gil 2014-04-10 18:03:00 UTC
Hi Al-Scandar, sorry for not having clarified the action. Note to self: comment always when changing the prioritization of a report.

Bug status, priority, and target milestone fields summarize and reflect reality and do not cause it. This report has been open since 2004, and currently nobody seems to be working or planning to work on it. The "Lowest" just reflects that.

Bug 20646 - Store more target site metadata in interwiki table (which is blocking this report) seems to be in a similar situation, inactive.

On the other hand, it looks like the VisualEditor team is working on the related Bug 37902 - Implement rendering of redlinks and stubs (in a post-processor?) 

If the Platform team or someone else wants to include this request in their plan, then they can set priority accordingly.
Comment 24 TeleComNasSprVen 2014-05-10 01:10:57 UTC
I'm going to have to suggest another configuration setting (even though MediaWiki is already bloated enough with those already). We need a way for wikis to opt out if they do not need nor want this change.

I'm not sure if this has been suggested, but should we just keep this feature confined to the same wiki farm? Or ask the target wiki if the source wiki wants to check the existence of the target wiki's article? I'd imagine possible abuse like {{#ifexist:google:foo}} when we would not feasibly check if a google page exists or not.
Comment 25 John Mark Vandenberg 2014-05-10 03:42:31 UTC
I suspect this feature request will be solved/solvable when Wikidata integrates all of the projects, esp. Wiktionary (bug number?)

Then instead of [[w:Blah]] magically being red based on weekly updates, wikis would use a template like {{ifexistson|enwiki}} to call  entity:getSitelink( 'enwiki' )  in order to determine whether there is an enwiki page for the local page.

More interesting possibilities are possible once we also have Lua access to any item. (bug 47930).
Then the page [[wikt:ru:Foo]] can do magic relating to [[w:en:Blah]] using calls like {{ifexistson|enwiki|Q527633}}
Comment 26 James Forrester 2014-05-10 05:32:14 UTC
(In reply to TeleComNasSprVen from comment #24)
> I'm going to have to suggest another configuration setting (even though
> MediaWiki is already bloated enough with those already). We need a way for
> wikis to opt out if they do not need nor want this change.
> 
> I'm not sure if this has been suggested, but should we just keep this
> feature confined to the same wiki farm? Or ask the target wiki if the source
> wiki wants to check the existence of the target wiki's article? I'd imagine
> possible abuse like {{#ifexist:google:foo}} when we would not feasibly check
> if a google page exists or not.

I think just doing it for the current wiki farm is a sane approach; later we might do a wider system, but it would need us to significantly change the information held in the interwiki map.
Comment 27 James Forrester 2014-05-10 05:40:39 UTC
(In reply to John Mark Vandenberg from comment #25)
> I suspect this feature request will be solved/solvable when Wikidata
> integrates all of the projects, esp. Wiktionary (bug number?)

No.

This is about skins and appearance, not structural data relating items. (Also, Wikidata isn't remotely the right way to go about this.)

The system we will build as part of the switch from the PHP parser to Parsoid for generating the read HTML will be able to achieve this as an extension of the existing system built for VisualEditor in fixing bug 37901, a pre-cursor to 37902.

VisualEditor requests the existence status of each of the links on the page and sets them to be red or otherwise based on this status; the same styling can be calculated server-side and returned as an API call (without client-side Javascript), which means that this can work for all users, and extending the status checking to other MediaWiki instances in the same farm (or even further afield) is a relatively simple extension of this principle.
Comment 28 TeleComNasSprVen 2014-05-18 10:06:24 UTC
(In reply to James Forrester from comment #27)
> (In reply to John Mark Vandenberg from comment #25)
> 
> VisualEditor requests the existence status of each of the links on the page
> and sets them to be red or otherwise based on this status; the same styling
> can be calculated server-side and returned as an API call (without
> client-side Javascript), which means that this can work for all users, and
> extending the status checking to other MediaWiki instances in the same farm
> (or even further afield) is a relatively simple extension of this principle.

Can the checks be feasibly done without placing too much load and performance worry on the servers? As someone noted above, even if some of the work was offloaded to cache such querying would already put a strain on the servers.
Comment 29 James Forrester 2014-05-22 20:07:45 UTC
(In reply to TeleComNasSprVen from comment #28)
> (In reply to James Forrester from comment #27)
> > (In reply to John Mark Vandenberg from comment #25)
> > 
> > VisualEditor requests the existence status of each of the links on the page
> > and sets them to be red or otherwise based on this status; the same styling
> > can be calculated server-side and returned as an API call (without
> > client-side Javascript), which means that this can work for all users, and
> > extending the status checking to other MediaWiki instances in the same farm
> > (or even further afield) is a relatively simple extension of this principle.
> 
> Can the checks be feasibly done without placing too much load and
> performance worry on the servers? As someone noted above, even if some of
> the work was offloaded to cache such querying would already put a strain on
> the servers.

Sure; caching the state of the pages is already inside the API cluster's bailiwick, and this would just be a (large) client load on that. It's almost certainly feasible, albeit we may need to bump up the API cluster a little.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links