Last modified: 2011-05-10 01:31:41 UTC
The current User-Agent policy for Wikimedia sites [1] restricts web (and possibly HP webOS) developers [2], because "User-Agent" can't be updated by the XMLHttpRequest::setRequestHeader() method [3]. The official Wikipedia iPhone app has "User_Agent" instead of "User-Agent" [4]. Please could this custom "User_Agent" header field be supported for other clients? Alternatively, the "MediaWiki-API-Error" header field is already in use [5]. Please could this prefix be reused in a custom "MediaWiki-API-Key" or "MediaWiki-User-Agent" header field? [1] <http://meta.wikimedia.org/wiki/User-Agent_policy> [2] <http://www.mediawiki.org/wiki/API:Quick_start_guide#Identifying_your_client> [3] <http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader-method> [4] <https://github.com/wikimedia/wikipedia-iphone/blob/master/Classes/RootViewController.m> [5] <http://www.mediawiki.org/wiki/API:Errors_and_warnings#Errors>
Web developers making Ajax requests (XHttpRequest) can't and don't have to touch the User-Agent. The browser environment in which you are in already has this set. If this is not the case, I'd say contact your browser vendor, not a bug. The field is called "User-Agent' and not User_Agent, if an application uses the wrong key, that application should be fixed, this is not a bug on Wikimedia's end imho. The User-Agent header must be sent, this policy is unlikely to change. If you prefer to send some kind of identification in environments where a User-Agent has already been set (eg. in a browser), you may use X-prefixed fields I guess. ie. X-Source: MyAwesome Gadget; Version/1.0; Contact/johndoe@wikimedia.org;
(In reply to comment #1) > Web developers making Ajax requests (XHttpRequest) can't and don't have to > touch the User-Agent. The browser environment in which you are in already has > this set. If this is not the case, I'd say contact your browser vendor, not a > bug. From the User-Agent policy [1]: "Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious." The browser environment is UIWebView [6], which has Safari's default user agent string -- it doesn't contain the required contact information, and it can't be modified using UIWebView's public API. > The User-Agent header must be sent, this policy is unlikely to change. > If you prefer to send some kind of identification in environments where a > User-Agent has already been set (eg. in a browser), you may use X-prefixed > fields I guess. > > ie. > > X-Source: MyAwesome Gadget; Version/1.0; Contact/johndoe@wikimedia.org; Sending an "X-Source" header field (or any other custom field) won't help, if the Wikimedia servers automatically reject (403 Forbidden) the API request, after looking at just the "User-Agent" header field. The standard "From" header field [7] might be another option for identifying the bot, because it isn't in the XMLHttpRequest::setRequestHeader() method's exclusion list [3]. But this also requires a change to the User-Agent policy. [1] <http://meta.wikimedia.org/wiki/User-Agent_policy> [3] <http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader-method> [6] <http://developer.apple.com/library/ios/documentation/UIKit/Reference/UIWebView_Class/> [7] <http://tools.ietf.org/html/rfc2616#section-14.22>
Adding Hampton and Tim to CC as relevant people to this discussion
There are two things that should be very much distinguished, some default user-agents are blocked due to abuse and apps should not use those user-agents, and two we want all bots to make themselves identifiable and contactable. Those are separate requirements, which are currently intermixed to make the rule 'set a "unique" user-agent. 2 Things from my point of view. 1: Is this an actual situation creating a problem at the moment, ergo are services being blocked because of this ? As far as I'm aware not, unless a service xmlhttprequest/webview thing is identifying itself as wget or perl, in which case that service should probably consider fixing their default useragent string anyways. 2: Having said that, if we have requirements that your bot should be identifiable, yet also have to account for user-agent's that cannot be changed, then it is probably a good idea indeed to set/define a standard for how that additional identification should be done in cases where the user-agent cannot be changed. From is an option, but is limited to mailboxes, which might be less useful for us, since titles of tools can be much more useful for identification. I sort of like the X-Source idea.
(In reply to comment #4) > From is an option, but is limited to mailboxes, which might be less useful for > us, since titles of tools can be much more useful for identification. I sort of > like the X-Source idea. From the latest RFC [8]: "Normally, a mailbox is composed of two parts: (1) an optional display name that indicates the name of the recipient (which can be a person or a system) that could be displayed to the user of a mail application, and (2) an addr-spec address enclosed in angle brackets ("<" and ">"). There is an alternate simple form of a mailbox where the addr-spec address appears alone, without the recipient's name or the angle brackets." From: "MyAwesomeGadget/1.0" <johndoe@wikimedia.org> (In this example, the display name had to be a quoted-string because of the period character). [8] <http://tools.ietf.org/html/rfc5322#section-3.4>
(In reply to comment #2) > (In reply to comment #1) > > Web developers making Ajax requests (XHttpRequest) can't and don't have to > > touch the User-Agent. The browser environment in which you are in already has > > this set. If this is not the case, I'd say contact your browser vendor, not a > > bug. > > From the User-Agent policy [1]: "Do not copy a browser's user agent for your > bot, as bot-like behavior with a browser's user agent will be assumed > malicious." > > The browser environment is UIWebView [6], which has Safari's default user agent > string -- it doesn't contain the required contact information, and it can't be > modified using UIWebView's public API. > That quote from the policy means that, say, you're creating a PHP application that is going to massively interact with something on Wikimedia (say an automated vandalism revert bot for en.wikipedia.org that extracts a lot of information and submits edits through the API). That you should not copy the user-agent of a browser and do something like this: <?php ini_set( "user_agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6; nl-nl) AppleWebKit/533 Version/5.0.4 Safari/533"); .. just to get around the user-agent requirement, doing the above is malicious, since you're not really a Safari browser, but a PHP application. However if you're executing your JavaScript widget in Safari or in an application that in turn executes it's stuff via the Safari/WebKit framework from Apple's operating system API or something like that, I don't think it's a problem if those requests came into Wikimedia's servers with Safari default user agent. Granted, it would be nice if you could identify yourself somehow, but I'm not sure it's that big a deal. Most if not all requests from JavaScript gadgets on Wikimedia are sent with the user-agent of the browser of the Wikimedian who uses the gadget.
Since this bug appears to be based on a misreading of a meta page, I have updated the meta page with a clarification and will now mark this bug as invalid. Nobody is going to be blocked just for using XHR. At some point we may figure out a good way to identify browser-based applications, but that time is not right now. You could send a random header field like X-Browser-App, but it's unlikely anyone will see it, because we don't log such header fields. The sysadmin would have to use a packet sniffer or set up custom logging in MediaWiki.
Thanks, everyone. (In reply to comment #1) > The field is called "User-Agent' and not User_Agent, if an application uses the > wrong key, that application should be fixed, this is not a bug on Wikimedia's > end imho. I've filed Bug 28905.