Last modified: 2014-08-26 17:11:13 UTC
Fields can have multiple values, in several different ways: * some file metadata (EXIF etc) fields can have multiple values * we parse some data from HTML code of license templates; some images have multiple license templates * sometimes the same property can have a value from both the file and the description * categories, and any properties based on categories, are in many-to-many relation with images * (there are also multi-languaged values which can be multivalued when all languages are requested, but we already deal with that) Right now we handle this in a very hacky way for some fields (e.g. concatenate categories with "|") and don't handle it at all for most (one of the values is selected by some random aspect of the code). This will be especially problematic if we want to use CommonsMetadata as a helper tool for the Wikidata migration. A proper multivalue handling should probably be able to: * indicate whether or not the given field is multivalued * indicate the source (e.g. if one of the values comes from the file, the other from the description, we should be able to somehow tell that) * synchronize properties somehow (e.g. a multilicensed image will have multiple license names and multiple license URLs; the user of the API has to be able to match the right name to the right link)
We already use arrays with _type key for multilanguaged arrays (even though it is an ugly hack), so it seems logical to use the same format (_type=ul, see [1]) for multivalued properties. We currently return an array with 'value' and 'source' fields for a single property; for multivalued properties we could maybe return such an array for each value, that would make it easy to indicate different sources (although ugly and not very compact). For marking which values of multivalued properties belong tohether, we could maybe use an additional 'group' field (e.g. License, LicenseShortName etc. read from the first license template would have group=1). [1] https://www.mediawiki.org/wiki/Manual:File_metadata_handling#Format_of_this_merged_metadata
*** Bug 64803 has been marked as a duplicate of this bug. ***
*** Bug 64888 has been marked as a duplicate of this bug. ***
Copying across a point raised in bug: 64888 which makes this issue more severe. This bug results in dual licensed material pointing to the wrong license. I.e. the text will say e.g. "CC BY-SA 3.0" but will link to http://www.gnu.org/copyleft/fdl.html. Apart from being highly confusing it most likely violates one of the two licenses. Example: https://sv.wikipedia.org/wiki/Sveriges_l%C3%A4n#mediaviewer/Fil:Greater_coat_of_arms_of_Sweden.svg
Change 135194 had a related patch set uploaded by Gergő Tisza: [WIP] Handle multiple templates in TemplateParser https://gerrit.wikimedia.org/r/135194
Change 135194 merged by jenkins-bot: Handle multiple templates in TemplateParser https://gerrit.wikimedia.org/r/135194
What’s the current status of this patch? The issue described at bug:64888 is very problematic. I had not noticed that bug before, but frankly I would have been inclined to consider it a blocker for wide-deployment on Wikimedia sites. (Here is another example if needed: <https://commons.wikimedia.org/wiki/File:Silver_crystal.jpg#mediaviewer/File:Silver_crystal.jpg>))
That specific issue should be fixed; CommonsMetadata handles multivalued fields correctly internally, but only returns one value due to limitations of the API format. This is done more consistently now. The caching for CommonsMetadata is pretty complicated (there is a memcached layer on both the frontend and backend wiki, plus whatever the API framework uses, plus Varnish), so I am waiting to see if the issue is properly fixed (all the caches involved should wear out in 30 days) or some sort of manual purging will be necessary.
(In reply to Tisza Gergő from comment #8) > The caching for CommonsMetadata is pretty complicated (there is a memcached > layer on both the frontend and backend wiki, plus whatever the API framework > uses, plus Varnish), so I am waiting to see if the issue is properly fixed > (all the caches involved should wear out in 30 days) or some sort of manual > purging will be necessary. Please give more steam on this issue. The acceptance of the MultimediaViewer at least in the German Wikipedia community is lowering from day to day due to such critical bugs :-(
All examples from this and duplicate tickets provide correct data now. Is anyone aware of images which are still showing inconsistent licence information?
Asking again before closing this ticket: Is anyone aware of images which are still showing inconsistent licence information?
Setting back the new, since the original issue described in comment 0 still stands. I'll assume the problem with mixing up different licenses is fixed.
...setting back the state to new...