Last modified: 2011-03-09 21:14:51 UTC
I run Apache behind Squid accellerators which sets a header of HTTP_X_FORWARDED_FOR to represent the remote client IP address (instead of Apache's REMOTE_ADDR header). MediaWiki's test for this is incorrect (fix below). I tried using $wgUseSquid = true; in LocalSettings.php which did not solve the problem for me. In the file includes/ProxyTools.php there is a test for the header REMOTE_ADDR, which will always be true, even when running behind Squid. In my case, the IP reported for all clients is 127.0.0.1, which isn't useful when trying to track down abuse of the wiki from outsiders. I updated includes/ProxyTools.php to test for HTTP_X_FORWARDED_FOR first, then REMOTE_ADDR. If HTTP_X_FORWARDED_FOR is found, it is used. If it isn't found, REMOTE_ADDR is tested and used instead. The current test never reaches the second clause, and this fixes that condition. Below is my fixed stanza, in full. I've tested this and it works perfectly as expected: /* collect the originating ips */ # Client connecting to this webserver if ( isset( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) { $ipchain = array( $_SERVER['HTTP_X_FORWARDED_FOR'] ); } else if ( isset( $_SERVER['REMOTE_ADDR'] ) ) { $ipchain = array( $_SERVER['REMOTE_ADDR'] ); } else { # Running on CLI? $ipchain = array( '127.0.0.1' ); } $ip = $ipchain[0];
Created attachment 712 [details] Squid HTTP_X_FORWARDED_FOR header patch
Can you confirm that you've set up the proxy settings for MediaWiki correctly? We run the entire Wikimedia site behind squid proxies, and have no problems with this.
LocalSettings.php has one line that seems to be related to a proxy, and that is $wgProxyKey, which doesn't have any comments associated with it. A quick grep of the sources doesn't provide much info on its purpose either. What proxy settings would I need to check? The whole proxy configuration is 100% transparent on Apache and Squid, except for the addition of this header. Squid runs on port 80, Apache runs on port 80. There are no ProxyPass directives in Apache to point to Squid and vice versa. Also, if this required specific Proxy handling, this would be the only application out of over 200 we run here that needs to know its running behind a proxy. Every application we run doesn't seem to care, except in the case where they need to know the REMOTE_ADDR of the incoming user. In this case, they need to look for that information in HTTP_X_FORWARDED_FOR, not REMOTE_ADDR. The reason for filing this bug is obvious, if you look at the logic flow. You're always checking REMOTE_ADDR first, which will ALWAYS exist. Its just that in the case of running behind Squid, that header's value will contain 127.0.0.1 (or whatever the non-public, internal IP of your squid proxy runs on). The proper checking order should look for HTTP_X_FORWARDED_FOR first then REMOTE_ADDR. If you're not running behind Squid or another proxy, HTTP_X_FORWARDED_FOR will not exist, and REMOTE_ADDR can be trusted. If HTTP_X_FORWARDED_FOR does exist, then its value should be used. Using REMOTE_ADDR in every case is flat-out wrong. Another possible solution is mod_rpaf -> http://stderr.net/apache/rpaf/ But I don't want to run more modules than I have to, and certainly not for 1 single application that does not behave like the other 200 or so applications.
See DefaultSettings.php for all configurable settings. LocalSettings.php contains only whatever overrides you've put into it.
This bug also affects Cisco caches in use by at least one large ISP in Australia. I found myself unable to edit for 24 hours due to a block being placed on the transparent proxy server the isp uses. Here is their explanation: -------------------------------------------------------------------------- IP-Based Authentication Customers will often request that we bypass a particular website based on the fact that it uses IP-based authentication for hosts. If the website code has not been built upon web standards, they may only be checking for the address of the remote requestor, rather than that supplied by the X-Forwarded-For address. Details on how to retrieve the X-Forwarded-For address are included here: PHP: http://www.php.net/getenv ASP: Request.ServerVariables("HTTP_X-Forwarded-For") Example: http://www.lagado.com/proxy-test The Cisco cache engines used by Internode will pass an X-Forwarded-For variable to the server containing the original requestor IP address. When the cache engines intercept a request, they will replace the Client-IP field with their own IP address and store and supply the original requestor's IP address in the X-Forwarded-For field. X-Forwarded-For is an unofficial standard implemented originally by Squid (http://www.squid-cache.org/) and adopted by many other caching technologies since that time, including Cisco, who are the vendor of the Cache engines we use on the Internode network. You should also know that because you are using IP-Based authentication for your website, you will potentially encounter issues outside of our network, where customer requests coming from other ISPs also using cache engines may also be supplying the cache engine's address in in the Client-IP field. We prefer not to implement bypasses where a problem can be worked around by other means, as the more we add, the higher the long-term load on the cache engines becomes. It is also notable that any cache exemption we implement is not considered permanent, and may be removed without notice at any time. If at all possible, it would be better in the long-term to look at implementing code to detect the IP address from either the X-Forwarded-For address in addition to the Client-IP address field. This should resolve any future issues you encounter with transparently cached requests from most other networks. Examples of using the X-Forwarded-For header can be shown below: PHP: http://www.php.net/getenv ASP: Using "Request.ServerVariables("HTTP_X-Forwarded-For")" A website giving a fair bit of information about which variables are supplied on request is at http://www.lagado.com/proxy-test. As you can see, this website is retrieving all of the appropriate fields, showing that the code of any website should be able to detect the IP address of the original requestor, whether or not it is transparently cached. At this stage, we have not implemented the bypass to give you a chance to review this information. Please get back to us if there is a particular reason in which this option can not be implemented, and we will then review the request. I hope this helps and shows that the problem does lie at Wikipedias end. ----------------------------------------------------------------------
We do respect X-Forwarded-For on known trusted servers (in particular our own). Note that as client-provided data it CANNOT BE TRUSTED IN GENERAL. An attacker can trivially create false X-Forwarded-For headers with any arbitrary contents, which if blindly accepted would obscure their real address (making them impossible to block) or falsely blame some other address, perhaps with the intention of getting administrators to block some other user.
Whilst this works for some, I don't believe this bug should be closed as I believe there are large numbers of users who sit behind transparent proxies at large ISPs. The only solution for them if they share the proxy with a WikiVandal is to change ISPs which is unsatisfactory. I agree with the point the the X-Forwarded-For header cannot be trusted and is easily faked. A more sophisticated solution would be to record the X-Forwarded-For header AS WELL AS the REMOTE_ADDR from the server. Thus the unsophisticated vandals can be dealt with as well as allowing normal users to continue. This will also make it easier to track down vandals with the co-operation of ISPs as you can identify that it came from that ISP (via REMOTE_ADDR) and you have the vandals session IP (or that of an open proxy they might be using) via HTTP_X-Forwarded-For. This would require changes to the code for blocking IPs to allow for 1) Block a REMOTE_ADDR (currently implemented) 2) Block a HTTP_X-Forwarded-For address via a REMOTE_ADDR (the patch implements this, not currently implemented, suits unsophisticated and probably most common vandals). Will not affect normal users behind the same proxy 3) Don't trust HTTP_X-Forwarded-For via a particular REMOTE_ADDR (not currently implemented, requires an additional table of blocked proxies, rampant vandal, but you would record and have the HTTP_X-Forwarded-For header which will be useful for the ISP in identifying the vandal and terminate them via their TOS, until which no users at that ISP can edit). I realise that this is now not a simple patch as it impacts other areas of code and the operation of blocking functions if implemented this way. It would require the creation of an additional table of blocked proxies and possibly trusted proxies as well to cater for the case mentioned by Brion above. I do think though that there are large numbers of editors who would benefit from a change such as this. At a minimum we really need some data as to the extent of use of transparent proxies by editors, readers (potential editors) and vandals by collecting on one or two servers the use of the HTTP_X-Forwarded-For header. There are also variants on HTTP_X-Forwarded-For that would need to be catered for as well. Can we collect some data and then make a decision?
Closing this bug as WORKSFORME since the case of the original reporter is what we do, constantly, all day every day and it works fine. If you wish to request some different treatment of external proxies please open a new enhancement bug.