Last modified: 2014-05-11 02:08:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66907, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 64907 - Allow GWT uploads from ggpht.com


Summary:	Allow GWT uploads from ggpht.com

Status:	RESOLVED INVALID

Product:	Wikimedia
Classification:	Unclassified
Component:	Site requests (Other open bugs)
Version:	wmf-deployment
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	58224
	Show dependency tree / graph

Reported:	2014-05-05 17:58 UTC by Fæ
Modified:	2014-05-11 02:08 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Fæ 2014-05-05 17:58:37 UTC

please add the following domain(s) to the wgCopyUploadsDomains whitelist:
*.ggpht.com

This is to support uploads from the Rijksmuseum, for example:
http://lh4.ggpht.com/PwdJop7AQKAOvtiEZnfLkLezQKOyO8le69XKOxVYwtLoF0hAfkAa2o_u5eA8CAW_tk4Dm0gfjU8kTyDA6TW8hIAFXg=s0
is the image that Rijksmuseum returns via their API for artefact at:
http://www.rijksmuseum.nl/collectie/AK-MAK-127

Comment 1 Fæ 2014-05-05 18:03:17 UTC

Note that from a small test set, I see lh3, lh4, lh5, lh6 all being subdomains in use by the RM for images hosted at ggpht.com. However I feel that a more complex regex restriction is likely to be unnecessary.

Comment 2 Tomasz W. Kozlowski 2014-05-05 18:33:36 UTC

Apparently ggpht.com is a domain "Google uses to host data for YouTube", according to the internets; I'm unsure whether whitelisting the whole domain is a good idea.

Comment 3 Fæ 2014-05-05 19:03:11 UTC

Perhaps we can whitelist "lh\d*.ggpht.com" as more limited regex?

Comment 4 jeremyb 2014-05-05 20:25:41 UTC

(In reply to Fæ from comment #3)
> Perhaps we can whitelist "lh\d*.ggpht.com" as more limited regex?

I don't think that would a significant difference. (vs. a wildcard)

I don't know commons policy well but I guess flickr is ok because there are bots to check what the licensing is at flickr and record the value in an edit to file desc page. (and even then we still have to worry sometimes about flickrwashing)

(In reply to Tomasz W. Kozlowski from comment #2)
> Apparently ggpht.com is a domain "Google uses to host data for YouTube",

Seems to be more widespread. e.g. including Picasa pix

This would allow essentially the same range of content/uploaders as Google Drive unless we had a bot somehow checking for license metadata associated with a given URL (like we do with flickr)?

Comment 5 Fæ 2014-05-05 23:32:33 UTC

Apart from a more complex regex, like the "lh\d" or maybe "lh[1-9]" domain limitation, I am unsure what else to recommend.

I welcome other eyes on the example at https://www.rijksmuseum.nl/nl/collectie/BK-1968-212. This shows an artefact image which is broken into tiles, each tile appears hosted at ggpht.com. The API call I get my data from for the same artefact is https://www.rijksmuseum.nl/api/en/collection/BK-1968-212?key=xxxxxxxx&format=xml (blanked out my API key), this gives some interesting values, including a link to the full image:
<guid>4a53f0d0-9e70-4d00-b4e4-8f6ac028d276</guid>
<url>http://lh3.ggpht.com/HMIugFrj7Ostdj-FshnLkVcb7WQhL-mUEeJKS5ODQtexbsfaKb2jaMroIN7s7W_HV2RbenFGhbxSymNdEJJVGzjfed7-=s0</url>

If there is a way of adding some suitable verification to the image page, that we might make requirement of using this tricky Google domain, I would be happy to look into it.

There is an alternative of using the images available at Europeana, however this limits us to whatever subset Europeana happen to be hosting (it is not simply a mirror), and in truth adds no value as the images for the Rijksmuseum were actually taken from the same source I am attempting to enable for the GWT to read for itself.

Comment 6 Fæ 2014-05-11 02:08:19 UTC

Some more research has led me to an alternative (which was not in the least bit obvious from their API).

In the previous example of artefact "BK-1968-212", I can upload from http://www.rijksmuseum.nl/media/assets/BK-1968-212 and not have to rely on the hosted version at Google.

I presume that the RM are using a Google mirror when serving images to end users to reduce their server traffic. Unfortunately even their API does not provide the "internal" link as an alternative source, it has to be deduced and does not appear in the public facing documentation.

I am marking this request as resolved as I can apply this work-around.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links