Last modified: 2012-09-22 19:10:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31497, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 29497 - Parser doesn't support protocol relative external links in single-bracketed syntax


Summary:	Parser doesn't support protocol relative external links in single-bracketed s...

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	1.20.x
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Duplicates:	31284 (view as bug list)
Depends on:
Blocks:	20342
	Show dependency tree / graph

Reported:	2011-06-20 09:53 UTC by Niklas Laxström
Modified:	2012-09-22 19:10 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Niklas Laxström 2011-06-20 09:53:30 UTC

Parser doesn't support protocol relative external links of type [//example.com]. Adding a new bug because 20342 is too vague on the specific issue it tries to address.

Comment 1 Brion Vibber 2011-06-20 17:32:41 UTC

Updated summary to clarify that this is about inline links in wiki text documents.

Bug 20342 is primarily about URLs generated by/for user interface components and whatnot, where currently we would tend to select either 'http' or 'https' forms but are looking for ways to avoid splitting caches so the same HTML output can be stored and used for both.

For wikitext-formatted messages, it _might_ be useful to be able to pass in such URLs directly into things using '[$1 blah]'.

For documents in general, that's a bit more fragile; when we use protocol-relative links we're saying "we know FOR SURE that both this site and the other site we're talking about are available on both http and https, and that the correct thing to do is to send you to the same protocol on the other site".

IMO that's a bit flaky -- folks are probably more likely to accidentally put in a link that doesn't actually work in one mode or the other without testing it correctly -- but it might be a necessary evil.

Comment 2 Niklas Laxström 2011-06-20 18:33:22 UTC

Many bits of the software pass external links to interface messages, which are then parsed.

Comment 3 Roan Kattouw 2011-06-21 18:32:01 UTC

There's also stuff like {{fullurl:}} which returns absolute URLs including a protocol prefix, pointing to stuff that's at the local wiki. Now 1) it should be possible to configure this to spit out protocol-relative URLs (is it already?) and 2) the parser should be able to handle them when using the [] with fullurl combo as in:

[{{fullurl:{{FULLPAGENAME}}|action=edit}} Edit this page]

Comment 4 Roan Kattouw 2011-07-07 18:27:20 UTC

Fixed in r91663.

Comment 5 Bergi 2011-09-28 14:52:27 UTC

(In reply to comment #1)
> For documents in general, that's a bit more fragile; when we use
> protocol-relative links we're saying "we know FOR SURE that both this site and
> the other site we're talking about are available on both http and https, and
> that the correct thing to do is to send you to the same protocol on the other
> site".
There are sites we know for sure: our own. In the situation as it is, you will have to use [//wikipedia.org //wikipedia.org] instead of just //wikipedia.org. Example: http://test2.wikipedia.org/wiki/Special:ExpandTemplates?input=%7B%7BSERVER%7D%7D%0A%0A%5B%7B%7BSERVER%7D%7D%5D%0A%0A%5B%7B%7BSERVER%7D%7D%20%7B%7BSERVER%7D%7D%5D%0A%0A%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7D%7D%0A%0A%5B%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7D%7D%5D%0A%0A%5B%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7D%7D%20%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7D%7D%5D%0A%0A%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7Caction%3Dedit%7D%7D%0A%0A%5B%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7Caction%3Dedit%7D%7D%5D%0A%0A%5B%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7Caction%3Dedit%7D%7D%20%7B%7Bfullurl%3A%7B%7BPAGENAME%7D%7D%7Caction%3Dedit%7D%7D%5D
This is worse, and explizitly the fullurl:-thing will break a lot. So I think, at least for our own domain(s) we have to enable un-bracketed links.

> IMO that's a bit flaky -- folks are probably more likely to accidentally put in
> a link that doesn't actually work in one mode or the other without testing it
> correctly -- but it might be a necessary evil.

And I'd say it is necessary. Allowing only for spezific domains (settable in config?) would make it more complex than it must be. wgUrlProtocols (the js variable) would need to provide the domains for which protocol and which link syntax will work. Urghh. I think it is much cleaner to allow every site, even if there may happen accidents.

Comment 6 Roan Kattouw 2011-09-28 14:55:03 UTC

(In reply to comment #5)
> (In reply to comment #1)
> > For documents in general, that's a bit more fragile; when we use
> > protocol-relative links we're saying "we know FOR SURE that both this site and
> > the other site we're talking about are available on both http and https, and
> > that the correct thing to do is to send you to the same protocol on the other
> > site".
> There are sites we know for sure: our own. In the situation as it is, you will
> have to use [//wikipedia.org //wikipedia.org] instead of just //wikipedia.org.
[snip]
> This is worse, and explizitly the fullurl:-thing will break a lot. So I think,
> at least for our own domain(s) we have to enable un-bracketed links.
> 
Yes, using {{fullurl:}} to produce a clean link doesn't work any more. This is known and deliberate.

> > IMO that's a bit flaky -- folks are probably more likely to accidentally put in
> > a link that doesn't actually work in one mode or the other without testing it
> > correctly -- but it might be a necessary evil.
> 
> And I'd say it is necessary. Allowing only for spezific domains (settable in
> config?) would make it more complex than it must be. wgUrlProtocols (the js
> variable) would need to provide the domains for which protocol and which link
> syntax will work. Urghh. I think it is much cleaner to allow every site, even
> if there may happen accidents.
We do allow every site, where are you getting this idea that we're not?

Comment 7 Bergi 2011-09-28 15:19:00 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #1)
> > In the situation as it is, you will
> > have to use [//wikipedia.org //wikipedia.org] instead of just //wikipedia.org.
> > This is worse, and explizitly the fullurl:-thing will break a lot. So I think,
> > at least for our own domain(s) we have to enable un-bracketed links.
> > 
> Yes, using {{fullurl:}} to produce a clean link doesn't work any more. This is
> known and deliberate.

Deliberated? It don't think this is good practice. Apart from breaking existing links, it will make linking more user-unfriendly. Who would use [{{fullurl:xyz|abc}} http(s)://xyz?abc]? As a only-fullurl-link doesn't work any more, users will copypaste a protocoll-absolute, correctly (better: as intended) parsed link. Is that userfriendly?

> > Allowing only for spezific domains (settable in
> > config?) would make it more complex than it must be. wgUrlProtocols (the js
> > variable) would need to provide the domains for which protocol and which link
> > syntax will work. Urghh. I think it is much cleaner to allow every site, even
> > if there may happen accidents.
> We do allow every site, where are you getting this idea that we're not?

Yes, but not both link formats. I said that enabling the bracketless format for just a configurable set of sites (the proposed knowing-for-sure domains) wouldn't be better, why don't we allow just everything?

Comment 8 Roan Kattouw 2011-09-30 10:40:39 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > (In reply to comment #1)
> > > In the situation as it is, you will
> > > have to use [//wikipedia.org //wikipedia.org] instead of just //wikipedia.org.
> > > This is worse, and explizitly the fullurl:-thing will break a lot. So I think,
> > > at least for our own domain(s) we have to enable un-bracketed links.
> > > 
> > Yes, using {{fullurl:}} to produce a clean link doesn't work any more. This is
> > known and deliberate.
> 
> Deliberated? It don't think this is good practice. Apart from breaking existing
> links, it will make linking more user-unfriendly. Who would use
> [{{fullurl:xyz|abc}} http(s)://xyz?abc]? As a only-fullurl-link doesn't work
> any more, users will copypaste a protocoll-absolute, correctly (better: as
> intended) parsed link. Is that userfriendly?
Yeah, you're right that it's confusing. I figured that it would also be confusing if //anyWordThatStartsWithTwoSlashes would be linkified automatically, so I disabled that behavior deliberately. But I didn't consider the fullurl use case you brought up.

> Yes, but not both link formats. I said that enabling the bracketless format for
> just a configurable set of sites (the proposed knowing-for-sure domains)
> wouldn't be better, why don't we allow just everything?
I think you may be misinterpreting Brion's words, and I think those words themselves were unclear to begin with. I never suggested limiting protocol-relative URLs to select domains, although it may be a way out of the we-don't-want-every-word-beginning-with-slash-slash-to-be-linked problem.

Comment 9 Brion Vibber 2011-09-30 17:56:22 UTC

An instance where you're generating a complete URL to embed as (potentially clickable) full text in email or a web page probably should be in canonical form.

Comment 10 Roan Kattouw 2011-10-01 08:35:41 UTC

(In reply to comment #9)
> An instance where you're generating a complete URL to embed as (potentially
> clickable) full text in email or a web page probably should be in canonical
> form.
Yes. To expand on that: we now have {{canonicalurl:}} that always outputs a fully-qualified HTTP URL, even when saved or viewed using HTTPS. Earlier this week , Sam and I went through [[MediaWiki:Enotif body]] on all wikis that had overridden it and changed all instances of {{fullurl:}} and {{SERVER}}{{localurl:foo}} to {{canonicalurl:}} because e-mail clients also don't automatically link protocol-relative URLs in text.

Comment 11 Roan Kattouw 2011-10-01 10:23:35 UTC

*** Bug 31284 has been marked as a duplicate of this bug. ***

Comment 12 billinghurst 2011-11-01 12:18:14 UTC

Is there the ability to update {{canonicalurl}} to generate an absolute url that is protocol relative to the login type?  Alternatively if that is problematic, can there be (yet) another parser function that undertakes the task to generate the url relative to the user?  Having to do those sorts of hacks ''ad infinitum'' is surely just courting disaster against a simple ability especially as that {{canonicalurl}} will get used by the lazy, or someone will write templates to get around the issue. Thanks.

Comment 13 Roan Kattouw 2011-11-01 12:22:59 UTC

(In reply to comment #12)
> Is there the ability to update {{canonicalurl}} to generate an absolute url
> that is protocol relative to the login type?  Alternatively if that is
> problematic, can there be (yet) another parser function that undertakes the
> task to generate the url relative to the user?  Having to do those sorts of
> hacks ''ad infinitum'' is surely just courting disaster against a simple
> ability especially as that {{canonicalurl}} will get used by the lazy, or
> someone will write templates to get around the issue. Thanks.

What do you mean, exactly? If you want a protocol-relative URL, use {{fullurl:}}. This seems to be what you mean with "protocol relative to the login type". There isn't a parser function that outputs http:// URLs for people viewing over http and https:// URLs for people viewing over HTTPS because that would mean we'd have to split the parser cache, but this is exactly what protocol-relative URLs are for.

What do you mean with a "URL relative to the user"? Does that mean generating fully-qualified URLs, e.g. for e-mails that use https if the user logs in over https and http otherwise? Would this be based on the not-yet-existing "Always use HTTPS when I'm logged in" preference?

Comment 14 billinghurst 2011-11-01 12:44:10 UTC

(In reply to comment #13)
> (In reply to comment #12)
> > Is there the ability to update {{canonicalurl}} to generate an absolute url
> > that is protocol relative to the login type?  Alternatively if that is
> > problematic, can there be (yet) another parser function that undertakes the
> > task to generate the url relative to the user?  Having to do those sorts of
> > hacks ''ad infinitum'' is surely just courting disaster against a simple
> > ability especially as that {{canonicalurl}} will get used by the lazy, or
> > someone will write templates to get around the issue. Thanks.
> 
> What do you mean, exactly? If you want a protocol-relative URL, use
> {{fullurl:}}. This seems to be what you mean with "protocol relative to the
> login type". There isn't a parser function that outputs http:// URLs for people
> viewing over http and https:// URLs for people viewing over HTTPS because that
> would mean we'd have to split the parser cache, but this is exactly what
> protocol-relative URLs are for.
> 
> What do you mean with a "URL relative to the user"? Does that mean generating
> fully-qualified URLs, e.g. for e-mails that use https if the user logs in over
> https and http otherwise? Would this be based on the not-yet-existing "Always
> use HTTPS when I'm logged in" preference?

I meant what you explained in the first paragraph.

Sometimes it is easiest and more appropriate to display a url. To display a full protocol relative url is problematic, so for
https://en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464

I cannot code 
* {{fullurl:Wikisource:Sandbox|oldid=3501464}} as it doesn't give a protocol
* {{canonicalurl:Wikisource:Sandbox|oldid=3501464}} and takes me out of secure protocol

I have to code it as
* [{{fullurl:Wikisource:Sandbox|oldid=3501464}} {{fullurl:Wikisource:Sandbox|oldid=3501464}] ;
* [//en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464 //en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464]

It would be fantastic if I could code either
* {{canonicalurl:Wikisource:Sandbox|oldid=3501464}}; or
* {{protocolrelativeurl:Wikisource:Sandbox|oldid=3501464}}
and they could exhibit the protocol relative urls
http://en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464
https://en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464
depending on how I login.

If it relates to the second paragraph, then maybe I don't comprehend what happens with http:// after the change. I just see that I keep getting forced into http:// in so many places and don't see an easy (lazy) solution.

Comment 15 Roan Kattouw 2011-11-01 12:51:51 UTC

(In reply to comment #14)
> I have to code it as
> * [{{fullurl:Wikisource:Sandbox|oldid=3501464}}
> {{fullurl:Wikisource:Sandbox|oldid=3501464}] ;
> * [//en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464
> //en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464]
> 
Yes, unfortunately you do have to. Protocol-relative URLs aren't linked magically because I thought that would be too error-prone, see comment 8.

> It would be fantastic if I could code either
> * {{canonicalurl:Wikisource:Sandbox|oldid=3501464}}; or
> * {{protocolrelativeurl:Wikisource:Sandbox|oldid=3501464}}
> and they could exhibit the protocol relative urls
> http://en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464
> https://en.wikisource.org/w/index.php?title=Wikisource:Sandbox&oldid=3501464
> depending on how I login.
> 
That would be fantastic, yes. It would also be fantastically cache-breaking :(

> If it relates to the second paragraph, then maybe I don't comprehend what
> happens with http:// after the change. I just see that I keep getting forced
> into http:// in so many places and don't see an easy (lazy) solution.
canonicalurl will continue to output http:// URLs, unless and until HTTPS actually becomes the preferred protocol (but my understanding is we don't have the infrastructure for HTTPS-by-default right now). At some point soonish, we will want to introduce a preference with which users can indicate that they want to log in over HTTPS only. Enabling this preference will have the following consequences:
* Login attempts over HTTP for that username will be refused and the user will be redirected to the HTTPS login form
* An insecure (i.e. shared between HTTPS and HTTP) cookie will be set upon login that indicates that the user prefers HTTPS. This cookie should persist for a long time, even after logout or session expiry. If a user makes an HTTP request and this cookie is present, they will immediately be redirected to HTTPS, even if they're not logged in any more
* Of course login cookies created through an HTTPS login would always be so-called secure cookies (meaning HTTP doesn't have access to them) regardless of this preference

The one that's of particular interest for this bug is:
* URLs in e-mails sent to users with the HTTPS preference enabled would point to https:// instead of http://

Comment 16 Marcin Cieślak 2012-09-19 22:27:39 UTC

*** Bug 40369 has been marked as a duplicate of this bug. ***

Comment 17 Niklas Laxström 2012-09-22 17:45:32 UTC

[//example.com] works now. If something else needs doing, it's topic of another bug.

Comment 18 Helder 2012-09-22 17:52:44 UTC

The "free link syntax" mentioned in the bug title do not works: "//example.com" generates plain text. See comment 5.

Comment 19 Krinkle 2012-09-22 19:10:09 UTC

(In reply to comment #18)
> The "free link syntax" mentioned in the bug title do not works: "//example.com"
> generates plain text. See comment 5.

Yes, but:


(In reply to comment #6)
> Yes, using {{fullurl:}} to produce a clean link doesn't work any more. This is
> known and deliberate.
> 

A plain //foo anywhere will break pages, I'm sure there are countless mentions of it out there where it is not intended as a link but simply as two slashes (e.g. on talk pages about a programming-related subject).

Aside from breaking things, it is also arguably bad user facing. I'm not sure whether the average user is ready to be looking at //example.org and know that it is a link.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links