Last modified: 2014-11-14 09:45:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73170, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71170 - Ensure WikibaseDataModelSerialization feature parity with WikibaseLib
Ensure WikibaseDataModelSerialization feature parity with WikibaseLib
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=13
:
Depends on: 71512 72183
Blocks: 62188
  Show dependency treegraph
 
Reported: 2014-09-23 07:59 UTC by tobias.gritschacher
Modified: 2014-11-14 09:45 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Comment 1 Jan Zerebecki 2014-10-23 11:26:23 UTC
It seems it is also missing the ability to represent removed Terms: search for removed in serializeMultilingualValues() in Wikibase.git/lib/includes/serializers/MultilingualSerializer.php .
Comment 2 Daniel Kinzler 2014-11-13 11:30:31 UTC
Things that the old serializer supported (or should have supported), and need to be possible using the new model and serializer too:

* Filtering (terms by language, sitelinks by site, statements by rank)
* Apply language fallback (and put info about the fallback in the terms)
* optionally group claims by property
* optionally use lists instead of maps that use IDs as keys (to cater to the quirks of ApiFormatXml).
* deletion markers used in responses of API modules like RemoveClaim

In some cases, we want to (optionally) inject extra information into the serialization (or presentational model), e.g.:
* the DataType of PropertyValueSnaks
* (future) full URIs for external identifiers
* (future) quantity values converted to base units

We should try to avoid putting such "derivative" versions of our data back into the database, as this would constitute data loss and/or create confusion (especially in the case of automatic transliteration). 

Another question is if and how "derivative" entity information can and should be represented by our data model. We should have a spec that makes a clear distinction between the "core" data model, and "representational" or "informational" derivatives.

PS: We also need a way to represent order explicitly when using id based maps instead of lists (for statements, qualifiers in a claim, references in a statement, and snaks in a reference). This is part of the core model, but was not addressed by the old serializer either.
Comment 3 Daniel Kinzler 2014-11-13 14:34:54 UTC
After some discussion with Jan and Adrian, some points became clear:
* We want to implement language fallback and filtering on the model level, not in the serializer
* Other things however, like grouping of statements, or associative vs. indexed arrays, have to be implemented in the serializer (a flag to the serializer factory could do the trick)
* Presentation-layer concepts can be represented in the datamodel using subclassing (e.g. TermWithLanguageFallback extends Term).
* It would be good to include a version number in serializer output
* Entities "tainted" by fallback, filtering, etc are not a big issue in practice, because EditEntity only adds/replaces entries per default.
* EditEntity still needs to fail on terms with fallback info, to avoid writing automatic transliterations to the database.

In general, it became clear that the serialization we use in the database will often not be exactly the same as the one we use in the database. For instance, the serialization in the database does not contain the data-type in snaks, nor should it in the future contain things like expanded URIs of external IDs or converted quantity values.

The question whether we can always group statements by property, or whether we want to retain the option to ouput flat lists of Statements, remained open. It would be nicer if we could always group.

Another general consideration: we want our output format to stay relatively stable, and it should be easy to use directly, without the need for specialized data model libraries. While it would be nice to have libraries for serialization and representing our data model in multiple languages, we currently do not supply those. As long as we don't, we have to assume people operate on the raw data structures.
Comment 4 Adrian Lang 2014-11-13 14:41:47 UTC
» we use in the database will often not be exactly the same as the one we use in the database.«?

I agree that we have to consider users which don't have a data model implementation. I just want the data model implementations we have to be able to provide the things we provide in the serialization.
Comment 5 Daniel Kinzler 2014-11-13 16:08:50 UTC
Of course i meant "the serialization we use in the database will often not be exactly the same as the one we use in the API".

I agree with Adrian that our data model should generally be able to represent everything our serializer puts into the output.
Comment 6 Daniel Kinzler 2014-11-14 09:45:34 UTC
Another aspect that came up is the conceptual separation of the "core" data model (representing the knowledge stored in the wiki directly) vs. the "presentational" model (with fallback, filtering, and expansion). A marker interface may work, but that's not very nice (think LSP). The cleanest solution would be to duplicate the entire data model class hierarchy, but that's massive overhead, hard to maintain, and potentially confusing. The best we can do for now seems to be to clearly document which classes and fields are part of the core model, and which are presentational.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links