Last modified: 2014-05-11 02:08:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66907, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64907 - Allow GWT uploads from ggpht.com
Allow GWT uploads from ggpht.com
Status: RESOLVED INVALID
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
wmf-deployment
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 58224
  Show dependency treegraph
 
Reported: 2014-05-05 17:58 UTC by
Modified: 2014-05-11 02:08 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description 2014-05-05 17:58:37 UTC
please add the following domain(s) to the wgCopyUploadsDomains whitelist:
*.ggpht.com

This is to support uploads from the Rijksmuseum, for example:
http://lh4.ggpht.com/PwdJop7AQKAOvtiEZnfLkLezQKOyO8le69XKOxVYwtLoF0hAfkAa2o_u5eA8CAW_tk4Dm0gfjU8kTyDA6TW8hIAFXg=s0
is the image that Rijksmuseum returns via their API for artefact at:
http://www.rijksmuseum.nl/collectie/AK-MAK-127
Comment 1 2014-05-05 18:03:17 UTC
Note that from a small test set, I see lh3, lh4, lh5, lh6 all being subdomains in use by the RM for images hosted at ggpht.com. However I feel that a more complex regex restriction is likely to be unnecessary.
Comment 2 Tomasz W. Kozlowski 2014-05-05 18:33:36 UTC
Apparently ggpht.com is a domain "Google uses to host data for YouTube", according to the internets; I'm unsure whether whitelisting the whole domain is a good idea.
Comment 3 2014-05-05 19:03:11 UTC
Perhaps we can whitelist "lh\d*.ggpht.com" as more limited regex?
Comment 4 jeremyb 2014-05-05 20:25:41 UTC
(In reply to Fæ from comment #3)
> Perhaps we can whitelist "lh\d*.ggpht.com" as more limited regex?

I don't think that would a significant difference. (vs. a wildcard)

I don't know commons policy well but I guess flickr is ok because there are bots to check what the licensing is at flickr and record the value in an edit to file desc page. (and even then we still have to worry sometimes about flickrwashing)

(In reply to Tomasz W. Kozlowski from comment #2)
> Apparently ggpht.com is a domain "Google uses to host data for YouTube",

Seems to be more widespread. e.g. including Picasa pix

This would allow essentially the same range of content/uploaders as Google Drive unless we had a bot somehow checking for license metadata associated with a given URL (like we do with flickr)?
Comment 5 2014-05-05 23:32:33 UTC
Apart from a more complex regex, like the "lh\d" or maybe "lh[1-9]" domain limitation, I am unsure what else to recommend.

I welcome other eyes on the example at https://www.rijksmuseum.nl/nl/collectie/BK-1968-212. This shows an artefact image which is broken into tiles, each tile appears hosted at ggpht.com. The API call I get my data from for the same artefact is https://www.rijksmuseum.nl/api/en/collection/BK-1968-212?key=xxxxxxxx&format=xml (blanked out my API key), this gives some interesting values, including a link to the full image:
<guid>4a53f0d0-9e70-4d00-b4e4-8f6ac028d276</guid>
<url>http://lh3.ggpht.com/HMIugFrj7Ostdj-FshnLkVcb7WQhL-mUEeJKS5ODQtexbsfaKb2jaMroIN7s7W_HV2RbenFGhbxSymNdEJJVGzjfed7-=s0</url>

If there is a way of adding some suitable verification to the image page, that we might make requirement of using this tricky Google domain, I would be happy to look into it.

There is an alternative of using the images available at Europeana, however this limits us to whatever subset Europeana happen to be hosting (it is not simply a mirror), and in truth adds no value as the images for the Rijksmuseum were actually taken from the same source I am attempting to enable for the GWT to read for itself.
Comment 6 2014-05-11 02:08:19 UTC
Some more research has led me to an alternative (which was not in the least bit obvious from their API).

In the previous example of artefact "BK-1968-212", I can upload from http://www.rijksmuseum.nl/media/assets/BK-1968-212 and not have to rely on the hosted version at Google.

I presume that the RM are using a Google mirror when serving images to end users to reduce their server traffic. Unfortunately even their API does not provide the "internal" link as an alternative source, it has to be deduced and does not appear in the public facing documentation.

I am marking this request as resolved as I can apply this work-around.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links