Last modified: 2013-08-04 09:01:38 UTC
When $wgServer has a scheme, i.e., it is not protocol-relative, which happens to be the default, wfExpandUrl( ..., PROTO_HTTPS ) will give an HTTP link. This causes ResourceLoader content to be loaded over an insecure connection. It also causes redirect loops when redirecting to HTTPS using wfExpandUrl.
I believe the answer to this is "don't set $wgServer to an insecure HTTP URL then". AFAIK the HTTPS code doesn't support $wgServer values with http as the scheme, although the redirect loop is a nasty failure mode. Assigning to Chris because he wrote the HTTPS redirection code.
Moreover, doesn't setting $wgServer to an HTTP URL result in HTTP URLs appearing in articles, even when viewing them over HTTPS?
(In reply to comment #2) > Moreover, doesn't setting $wgServer to an HTTP URL result in HTTP URLs > appearing in articles, even when viewing them over HTTPS? This I have not tested, but note that this bug will only occur if wfExpandUrl() is called, meaning if the link is relative (like most link URLs are AFAIK), then it won't be a problem.
(In reply to comment #3) > (In reply to comment #2) > > Moreover, doesn't setting $wgServer to an HTTP URL result in HTTP URLs > > appearing in articles, even when viewing them over HTTPS? > > This I have not tested, but note that this bug will only occur if wfExpandUrl() > is called, meaning if the link is relative (like most link URLs are AFAIK), > then it won't be a problem. Sure. But try this (fairly common) code snippet: [{{fullurl:{{FULLPAGENAME}}|action=edit§ion=new}} Add a new section to this page]
(In reply to comment #2) > Moreover, doesn't setting $wgServer to an HTTP URL result in HTTP URLs > appearing in articles, even when viewing them over HTTPS? Yes, almost all links to load.php and api.php urls are full urls that start with http://.
I guess I'm the only one in favor of killing $wgServer altogether? I think a situation in which you have two variables—$wgHTTPServer and $wgHTTPSServer—in which you explicitly define the secure and non-secure canonical URL versions would be better from a wiki administration perspective and a coding perspective. The protocol-relative magic feels very fragile to me. I'm honestly not quite sure how disruptive trying to kill $wgServer would be, though.
(In reply to comment #6) > I guess I'm the only one in favor of killing $wgServer altogether? > > I think a situation in which you have two variables—$wgHTTPServer and > $wgHTTPSServer—in which you explicitly define the secure and non-secure > canonical URL versions would be better from a wiki administration perspective > and a coding perspective. The protocol-relative magic feels very fragile to me. > There is no need to have two variables. Earlier last year (before we started using relative protocols) this is exactly what we had. By default we set $wgServer = 'http://example.org'; and on the secure servers to https. The reason we went for protocol-relative instead is because we now serve pages from the main servers, and we want to share the HTML cache. Having two variables solves nothing but introduces a lot of problems. Because that means you'll have to split the cache in a version where the HTML contains http:// everywhere and one with https://, since that is pointless and a problem already solved by the web, we use protocol-relative urls instead starting with //. The problems we have aren't solved by introducing that, and introducing that pattern is already possible without those 2 variables: > $wgServer = isHttps() ? 'http://example.org' : 'https://example.org'; We don't need the other value anyway when in the other request type. Except that we do, which is why we have a canonical protocol for API responses.
(In reply to comment #6) > I guess I'm the only one in favor of killing $wgServer altogether? > > I think a situation in which you have two variables—$wgHTTPServer and > $wgHTTPSServer—in which you explicitly define the secure and non-secure > canonical URL versions would be better from a wiki administration perspective > and a coding perspective. The protocol-relative magic feels very fragile to me. > Well, you're not the only one in favor of having better configurability for what the HTTP and what the HTTPS URLs should be. Protocol-relative URLs are a useful tool to increase cache effectiveness if the http and the https URLs happen to be identical except for the 's', like on the post-secure.wm.o WMF cluster and presumably in most sane 3rd-party setups that support both HTTP and HTTPS. The way the code currently works that's also the *only* way you'll be able to avoid random cache pollution; if we wanted to support the HTTP and HTTPS URLs being different, work would have to be done to split the parser cache based on secure-ness if the HTTP and HTTPS URLs differ. However, we need to retain the ability to not split the cache and use protocol-relative URLs instead, otherwise WMF's parser cache would double in size. > I'm honestly not quite sure how disruptive trying to kill $wgServer would be, > though. Very. The reason things are the way they are now is precisely because when I implemented this last year, I tried to make my changes minimally disruptive. And they were quite disruptive still, it took me weeks and involved an extensive tour of the skeletons in our closed :) . Messing around with this more would probably take quite a lot of work to clean up all the various assumptions surrounding $wgServer, wfExpandUrl(), Title::getFullURL(), etc. The way I see it, the only gain to be had here would be that we would support setups where HTTP and HTTPS have different URL patterns (like WMF's setup in the secure.wm.o days). I don't know how common that is; I think most people would want them to be the same when they set up SSL. The only reason that we used a different domain name for HTTPS was because we were running a farm of wikis and hadn't built the required infrastructure to do SSL for all of the domains, with all the load balancing and caching stuff involved. The way we "solved" that was by having one domain name that pointed to one, non-load-balanced machine which bypassed the HTTP caches :D (this is why secure was slow), obviously that wasn't scalable.
(In reply to comment #0) > When $wgServer has a scheme, i.e., it is not protocol-relative, which happens > to be the default, wfExpandUrl( ..., PROTO_HTTPS ) will give an HTTP link. This > causes ResourceLoader content to be loaded over an insecure connection. It also > causes redirect loops when redirecting to HTTPS using wfExpandUrl. If your server supports HTTPS and redirects HTTP to HTTPS, why set $wgServer to an HTTP protocol? I'd say set it to HTTPS, or protocol-relative if you want to support both. This doesn't make sense.. wfExpandUrl only expands the url, never changes it. If it contains http:// that is left alone, because users tend to set http for a reason, indicating it doesn't support https. Otherwise there is no reason to set it to http. (In reply to comment #5) > (In reply to comment #2) > > Moreover, doesn't setting $wgServer to an HTTP URL result in HTTP URLs > > appearing in articles, even when viewing them over HTTPS? > > Yes, almost all links to load.php and api.php urls are full urls that start > with http://. That is, if $wgServer is set to a http://server If wgServer is set to a protocol-relative url, then it will (obviously) not do that.
(In reply to comment #7) > There is no need to have two variables. Except if you want to have different URLs for HTTP and HTTPS, I think that was MZ was after. > The problems we have aren't solved by introducing that, and introducing that > pattern is already possible without those 2 variables: > > > $wgServer = isHttps() ? 'http://example.org' : 'https://example.org'; > > We don't need the other value anyway when in the other request type. > That doesn't actually work, it causes pollution of the parser cache, see my previous comment. > Except that we do, which is why we have a canonical protocol for API responses. To expand on this: we have $wgCanonicalServer, which is a fully-qualified (i.e. with protocol) version of $wgServer that is used when building URLs that will be used outside of the context of an HTTP(S) request, or where using protocol-relative URLs won't work or doesn't make sense for some other reason. The typical examples of this is are URLs in notification e-mails and in the recentchanges IRC feeds. In the API, we resolve protocol-relative URLs using the protocol of the current request, and we split the API Squid cache by protocol for this reason; $wgCanonicalServer only comes into play when dealing with things like the externallinks table (see below). $wgCanonicalServer is used in two different ways: * when building URLs to wiki pages, or expanding URLs like /wiki/Foo , the URL is prefixed with $wgCanonicalServer instead of $wgServer * when we encounter a protocol-relative URL that came from somewhere else (e.g. externallinks table in the API), we use $wgCanonicalServer to decide which protocol to put on it As you might have guessed, $wgCanonicalServer is currently set to "http:$wgServer" on WMF wikis. I deliberately designed it so it could be switched to https later, that's why it's separate from $wgInternalServer (used for Squid purges, will still need to be http). As for actual plans to switch the canonical protocol to HTTPS, that's a question for Ryan. In the meantime, Chris's cookie-based redirection feature should reduce the annoyance of HTTP links in e-mail notifications, because people that receive these e-mails are usually logged in when they receive them, so they'll automatically be redirected to HTTPS.
The primary problem right now is that there exists a case where giving PROTO_HTTPS to wfExpandUrl() does not give an HTTPS uri. This is a security vulnerability, as well as the root for possible bugs. So the main question is: why does wfExpandUrl do this, and what will happen if we change it?
(In reply to comment #11) > The primary problem right now is that there exists a case where giving > PROTO_HTTPS to wfExpandUrl() does not give an HTTPS uri. This is a security > vulnerability, as well as the root for possible bugs. So the main question is: > why does wfExpandUrl do this, and what will happen if we change it? How is that a security vulnerability? wfExpandUrl is by no means a security measure, and PROTO_HTTPS is nowhere documented nor intended to generate https urls. The argument to wfExpandUrl means the "preferred protocol if there isn't any", not the "protocol that will be used in the url". To put it bluntly, if something relies on it returning on https, then it is plain wrong. If that is the topic of this bug, then afaik we can close this as an invalid bug. Please describe (or show) the code that is passing PROTO_HTTPS to wfExpandUrl.
(In reply to comment #10) > (In reply to comment #7) >> There is no need to have two variables. > > Except if you want to have different URLs for HTTP and HTTPS, I think that was > MZ was after. That's one use-case, yes. I think secure.example.com is still fairly common, but I'm personally not too concerned with this. My general feeling is that whenever $wgServer is mentioned (in discussions or in code), it's always a guessing game with its value is. HTTP, HTTPS, protocol-relative? And this guessing game has a cost. An alternative approach is to be explicit in the variable name itself, so that there's less ambiguity. And there seems to be more and more of a reliance (in code) on the principle that http/https will be equivalent, but again, I'm not too concerned with this. > As you might have guessed, $wgCanonicalServer is currently set to > "http:$wgServer" on WMF wikis. I deliberately designed it so it could be > switched to https later, that's why it's separate from $wgInternalServer (used > for Squid purges, will still need to be http). Thank you for explaining this.
(In reply to comment #13) > My general feeling is that whenever $wgServer is mentioned (in discussions or > in code), it's always a guessing game with its value is. HTTP, HTTPS, > protocol-relative? And this guessing game has a cost. An alternative approach > is to be explicit in the variable name itself, so that there's less ambiguity. > I don't see the problem here. $wgServer is a configuration variable to set how MediaWiki should represent self-reference urls. If only HTTP or HTTPS is supported (most common) it is set to one of those. Having a $wgHTTPSServer variable doesn't make sense. What would you set it to if a server only supports HTTP, I guess you'll have to set it to http:// then because the server doesn't have https. And what if both are supported (like on Wikipedia), which does it use in the parser? Neither, because those urls can (and should be) without protocol so that they are relatively interpreted by the browser, because the visitor can be on either http or https. $wgServer indicates what the server supports. http, https or both. And in most cases $wgServer can/should be used as-is (without trying to expand it). And if a server preference is needed, then it can be ran through wfExpandUrl (e.g. for wgSecureLogin it will expand to HTTPS if the server supports that. Obviously if the server doesn't support HTTPS it shouldn't try to link there as it would be a broken link pointing to a server error. And if you don't support HTTPS, well, then it doesn't make sense to enable wgSecureLogin. Either way enabling wgSecureLogin when not supported is harmless afaik, it simply maintains http as it would other wise). Before we get into a very generic discussion, what is the actual bug here? What is the problem? How to reproduce this redirect problem?
There seems to be two related issues. Here are reproduction steps: 1) http resources when viewing a page with https: * using master from git * configure $wgServer to start with 'http:'. I'm using "http://localhost" * view any page over ssl: https://localhost/wiki/index.php/Main_Page Result: Most of the css and js resources are loaded from http urls (Chrome refuses to load them, IE gives warnings, etc) 2) Infinite redirect: * using master from git * configure $wgServer to start with 'http:' * configure $wgSecureLogin = true; * On any page, click the "Log in" link. The <a> points to an http:// url Result: Because Special:UserLogin redirects to https if it detects it is being viewed over http, you get an infinite redirect, because the Location header points to an http:// url.
(In reply to comment #7) > The problems we have aren't solved by introducing that, and introducing that > pattern is already possible without those 2 variables: > > > $wgServer = isHttps() ? 'http://example.org' : 'https://example.org'; This is the route that I think is ugly, but I'm starting to think might be the least evil of the options. I used this: if ( $wgSecureLogin === true && substr( $wgServer, 0, 7 ) === 'http://' ) { $wgServer = substr( $wgServer, 5 ); }
(In reply to comment #12) > How is that a security vulnerability? wfExpandUrl is by no means a security > measure, and PROTO_HTTPS is nowhere documented nor intended to generate https > urls. The argument to wfExpandUrl means the "preferred protocol if there isn't > any", not the "protocol that will be used in the url". > > To put it bluntly, if something relies on it returning on https, then it is > plain wrong. If that is the topic of this bug, then afaik we can close this as > an invalid bug. > > Please describe (or show) the code that is passing PROTO_HTTPS to wfExpandUrl. You may be right that something relying on wfExpandUrl to return HTTPS is wrong, but there are still many places in the core that do this exactly. Like Chris said above, loading default MediaWiki from Git and then going over to HTTPS causes all resources to be loaded over an insecure connection. Not only is this a usability problem since Chrome will refuse to load those resources, but it is a security vulnerability because you now have load.php and api.php requests going over an insecure connection, opening up MediaWiki to JS injections and leakage of sensitive information. Just because wfExpandUrl isn't a security measure does not mean it cannot cause security problems.
(In reply to comment #17) > (In reply to comment #12) > > How is that a security vulnerability? wfExpandUrl is by no means a security > > measure, and PROTO_HTTPS is nowhere documented nor intended to generate https > > urls. The argument to wfExpandUrl means the "preferred protocol if there isn't > > any", not the "protocol that will be used in the url". > > > > To put it bluntly, if something relies on it returning on https, then it is > > plain wrong. If that is the topic of this bug, then afaik we can close this as > > an invalid bug. > > > > Please describe (or show) the code that is passing PROTO_HTTPS to wfExpandUrl. > > You may be right that something relying on wfExpandUrl to return HTTPS is > wrong, but there are still many places in the core that do this exactly. Like > Chris said above, loading default MediaWiki from Git and then going over to > HTTPS causes all resources to be loaded over an insecure connection. Not only > is this a usability problem since Chrome will refuse to load those resources, > but it is a security vulnerability because you now have load.php and api.php > requests going over an insecure connection, opening up MediaWiki to JS > injections and leakage of sensitive information. Just because wfExpandUrl isn't > a security measure does not mean it cannot cause security problems. This is imho not a bug. It is expected (and imho acceptable) behaviour. For the same reason that we include the configured hostname, we include the configured protocol. Though I agree the following comparison does not relate to SSL, the fact that your issue above does is imho an unrelated coincidence and result of misconfiguration on the administrators part. Imagine I set up a wiki with wgServer='http://example.org' and then I set up http://localhost/ pointing to the same server. Oh noes, when I access over localhost everything points to example.org. That's not a mediawiki bug, but a configuration issue! Either redirect one to the other, set them up as 2 wikis (e.g. a if/else or switch construct by hostname and configure wgServer and/or wgDBname based on that) or (recommended) make sure the configuration is properly set up for either scenario. Give me one good reason why anything should work on https://wiki.example.org/ if the wiki is configured and "told" by the administrator that it is served from http://wiki.example.org. Fix your configuration?
(In reply to comment #15) > There seems to be two related issues. Here are reproduction steps: > > 1) http resources when viewing a page with https: > * using master from git > * configure $wgServer to start with 'http:'. I'm using "http://localhost" > * view any page over ssl: https://localhost/wiki/index.php/Main_Page > > Result: Most of the css and js resources are loaded from http urls (Chrome > refuses to load them, IE gives warnings, etc) > > 2) Infinite redirect: > * using master from git > * configure $wgServer to start with 'http:' > * configure $wgSecureLogin = true; > * On any page, click the "Log in" link. The <a> points to an http:// url > > Result: Because Special:UserLogin redirects to https if it detects it is being > viewed over http, you get an infinite redirect, because the Location header > points to an http:// url. 2 is a bug, yes. I wonder how that happens. Where does MediaWiki produce a redirect to HTTPS if server includes http://? That should be impossible.
(In reply to comment #19) > 2 is a bug, yes. I wonder how that happens. Where does MediaWiki produce a > redirect to HTTPS if server includes http://? That should be impossible. The change in https://gerrit.wikimedia.org/r/#/c/25530/1/includes/specials/SpecialUserlogin.php on line 152 is what does the redirect. It redirects to the output of wfExpandUrl with PROTO_HTTPS passed in. Since wfExpandUrl returns an http link, the page keeps redirecting. (In reply to comment #18) > This is imho not a bug. It is expected (and imho acceptable) behaviour. After reading up on the history of $wgServer, I tend to agree that setting $wgServer with http:// should mean that you don't have ssl available, so combining that with $wgSecureLogin = true is a conflict. I also think the codebase shouldn't have to identify and make a special case when the conflict occurs, and can assume configs are setup consistently. But, I can't think of anywhere else that we have 2 configuration parameters that can conflict to break the site, although I could definitely be wrong about that. I would prefer that when 2 configs are in conflict, and they have anything to do with security, we either put up a warning message so the admin knows that they messed up, or we assume the admin really meant to use the more secure one and we try to fix up the conflict. Is there consensus/precedence on which is preferred?
(In reply to comment #20) > (In reply to comment #19) > > 2 is a bug, yes. I wonder how that happens. Where does MediaWiki produce a > > redirect to HTTPS if server includes http://? That should be impossible. > > The change in > https://gerrit.wikimedia.org/r/#/c/25530/1/includes/specials/SpecialUserlogin.php > on line 152 is what does the redirect. It redirects to the output of > wfExpandUrl with PROTO_HTTPS passed in. Since wfExpandUrl returns an http link, > the page keeps redirecting. > Interesting, yeah, I can see how that fails. I was thinking it would redirect in a loop between http and https, but that didn't make sense. I suppose we could check the protocol in $wgServer in Setup.php and disable $wgSecureLogin accordingly. Either that, or throw an exception. I think we normally do silent failures for configs, which would be fine. Though I could also understand an exception, as it is consistent (e.g. not like you change config, site looks fine, and then later it starts throwing exceptions).
There should also really be something in the installer asking whether HTTPS is supported, that way $wgServer can be made protocol-relative by default if the administrator knows beforehand.
https://gerrit.wikimedia.org/r/27136
I should have spoken up sooner on Krinkle's suggestion about disabling $wgSecureLogin. It feels wrong to me to opt for the less secure option if there's a conflict. But like I put in the comments on Gerrit change #27136, if this is the best solution we can do it. I'd much rather have the exceptions be insecure than have people afraid to enable $wgSecureLogin in general.
IMHO, the best way to do it would be for $wgServer to not be a URI, but instead just be a host name. Because when somebody enables $wgSecureLogin it's not that far a leap to assume they want HTTPS.