Last modified: 2011-05-10 01:31:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30884, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 28884 - Browser-based JavaScript clients can't identify since XMLHttpRequest doesn't support changing User-Agent


Summary:	Browser-based JavaScript clients can't identify since XMLHttpRequest doesn't ...

Status:	RESOLVED INVALID

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2011-05-08 15:07 UTC by Ben Rimmington
Modified:	2011-05-10 01:31 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Ben Rimmington 2011-05-08 15:07:42 UTC

The current User-Agent policy for Wikimedia sites [1] restricts web (and possibly HP webOS) developers [2], because "User-Agent" can't be updated by the XMLHttpRequest::setRequestHeader() method [3].

The official Wikipedia iPhone app has "User_Agent" instead of "User-Agent" [4]. Please could this custom "User_Agent" header field be supported for other clients?

Alternatively, the "MediaWiki-API-Error" header field is already in use [5]. Please could this prefix be reused in a custom "MediaWiki-API-Key" or "MediaWiki-User-Agent" header field?

[1] <http://meta.wikimedia.org/wiki/User-Agent_policy>

[2] <http://www.mediawiki.org/wiki/API:Quick_start_guide#Identifying_your_client>

[3] <http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader-method>

[4] <https://github.com/wikimedia/wikipedia-iphone/blob/master/Classes/RootViewController.m>

[5] <http://www.mediawiki.org/wiki/API:Errors_and_warnings#Errors>

Comment 1 Krinkle 2011-05-08 16:10:31 UTC

Web developers making Ajax requests (XHttpRequest) can't and don't have to touch the User-Agent. The browser environment in which you are in already has this set. If this is not the case, I'd say contact your browser vendor, not a bug.

The field is called "User-Agent' and not User_Agent, if an application uses the wrong key, that application should be fixed, this is not a bug on Wikimedia's end imho.

The User-Agent header must be sent, this policy is unlikely to change.
If you prefer to send some kind of identification in environments where a User-Agent has already been set (eg. in a browser), you may use X-prefixed fields I guess.

ie.

X-Source: MyAwesome Gadget; Version/1.0; Contact/johndoe@wikimedia.org;

Comment 2 Ben Rimmington 2011-05-08 18:12:06 UTC

(In reply to comment #1)
> Web developers making Ajax requests (XHttpRequest) can't and don't have to
> touch the User-Agent. The browser environment in which you are in already has
> this set. If this is not the case, I'd say contact your browser vendor, not a
> bug.

From the User-Agent policy [1]: "Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious."

The browser environment is UIWebView [6], which has Safari's default user agent string -- it doesn't contain the required contact information, and it can't be modified using UIWebView's public API.

> The User-Agent header must be sent, this policy is unlikely to change.
> If you prefer to send some kind of identification in environments where a
> User-Agent has already been set (eg. in a browser), you may use X-prefixed
> fields I guess.
> 
> ie.
> 
> X-Source: MyAwesome Gadget; Version/1.0; Contact/johndoe@wikimedia.org;

Sending an "X-Source" header field (or any other custom field) won't help, if the Wikimedia servers automatically reject (403 Forbidden) the API request, after looking at just the "User-Agent" header field.

The standard "From" header field [7] might be another option for identifying the bot, because it isn't in the XMLHttpRequest::setRequestHeader() method's exclusion list [3]. But this also requires a change to the User-Agent policy.

[1] <http://meta.wikimedia.org/wiki/User-Agent_policy>

[3] <http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader-method>

[6] <http://developer.apple.com/library/ios/documentation/UIKit/Reference/UIWebView_Class/>

[7] <http://tools.ietf.org/html/rfc2616#section-14.22>

Comment 3 Derk-Jan Hartman 2011-05-08 19:41:41 UTC

Adding Hampton and Tim to CC as relevant people to this discussion

Comment 4 Derk-Jan Hartman 2011-05-08 19:54:00 UTC

There are two things that should be very much distinguished, some default user-agents are blocked due to abuse and apps should not use those user-agents, and two we want all bots to make themselves identifiable and contactable. Those are separate requirements, which are currently intermixed to make the rule 'set a "unique" user-agent.


2 Things from my point of view.

1: Is this an actual situation creating a problem at the moment, ergo are services being blocked because of this ? As far as I'm aware not, unless a service xmlhttprequest/webview thing is identifying itself as wget or perl, in which case that service should probably consider fixing their default useragent string anyways.

2: Having said that, if we have requirements that your bot should be identifiable, yet also have to account for user-agent's that cannot be changed, then it is probably a good idea indeed to set/define a standard for how that additional identification should be done in cases where the user-agent cannot be changed.

From is an option, but is limited to mailboxes, which might be less useful for us, since titles of tools can be much more useful for identification. I sort of like the X-Source idea.

Comment 5 Ben Rimmington 2011-05-08 20:45:03 UTC

(In reply to comment #4)
> From is an option, but is limited to mailboxes, which might be less useful for
> us, since titles of tools can be much more useful for identification. I sort of
> like the X-Source idea.

From the latest RFC [8]: "Normally, a mailbox is composed of two parts: (1) an optional display name that indicates the name of the recipient (which can be a person or a system) that could be displayed to the user of a mail application, and (2) an addr-spec address enclosed in angle brackets ("<" and ">").  There is an alternate simple form of a mailbox where the addr-spec address appears alone, without the recipient's name or the angle brackets."

From: "MyAwesomeGadget/1.0" <johndoe@wikimedia.org>

(In this example, the display name had to be a quoted-string because of the period character).

[8] <http://tools.ietf.org/html/rfc5322#section-3.4>

Comment 6 Krinkle 2011-05-08 20:58:18 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > Web developers making Ajax requests (XHttpRequest) can't and don't have to
> > touch the User-Agent. The browser environment in which you are in already has
> > this set. If this is not the case, I'd say contact your browser vendor, not a
> > bug.
> 
> From the User-Agent policy [1]: "Do not copy a browser's user agent for your
> bot, as bot-like behavior with a browser's user agent will be assumed
> malicious."
> 
> The browser environment is UIWebView [6], which has Safari's default user agent
> string -- it doesn't contain the required contact information, and it can't be
> modified using UIWebView's public API.
> 

That quote from the policy means that, say, you're creating a PHP application that is going to massively interact with something on Wikimedia (say an automated vandalism revert bot for en.wikipedia.org that extracts a lot of information and submits edits through the API). That you should not copy the user-agent of a browser and do something like this:

<?php
ini_set( "user_agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6; nl-nl) AppleWebKit/533 Version/5.0.4 Safari/533");


.. just to get around the user-agent requirement, doing the above is malicious, since you're not really a Safari browser, but a PHP application.

However if you're executing your JavaScript widget in Safari or in an application that in turn executes it's stuff via the Safari/WebKit framework from Apple's operating system API or something like that, I don't think it's a problem if those requests came into Wikimedia's servers with Safari default user agent.

Granted, it would be nice if you could identify yourself somehow, but I'm not sure it's that big a deal.

Most if not all requests from JavaScript gadgets on Wikimedia are sent with the user-agent of the browser of the Wikimedian who uses the gadget.

Comment 7 Tim Starling 2011-05-09 00:17:10 UTC

Since this bug appears to be based on a misreading of a meta page, I have updated the meta page with a clarification and will now mark this bug as invalid. 

Nobody is going to be blocked just for using XHR. At some point we may figure out a good way to identify browser-based applications, but that time is not right now. 

You could send a random header field like X-Browser-App, but it's unlikely anyone will see it, because we don't log such header fields. The sysadmin would have to use a packet sniffer or set up custom logging in MediaWiki.

Comment 8 Ben Rimmington 2011-05-10 01:31:41 UTC

Thanks, everyone.

(In reply to comment #1)
> The field is called "User-Agent' and not User_Agent, if an application uses the
> wrong key, that application should be fixed, this is not a bug on Wikimedia's
> end imho.

I've filed Bug 28905.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links