Last modified: 2014-05-06 12:19:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T60032, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 58032 - Allow page_props to be queried by value.
Allow page_props to be queried by value.
Status: VERIFIED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.23.0
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
u=dev c=backend p=13 s=2014-04-23
:
Depends on: 64411
Blocks: 39150 43732 46217
  Show dependency treegraph
 
Reported: 2013-12-05 11:48 UTC by Daniel Kinzler
Modified: 2014-05-06 12:19 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2013-12-05 11:48:27 UTC
Page_props associates key/value pairs with page ids:
  `pp_page` int(11) NOT NULL,
  `pp_propname` varbinary(60) NOT NULL,
  `pp_value` blob NOT NULL,

Currently, this does not however allow us to efficiently find pages by a given key/value pair. That however would be quite useful, e.g. for marking pages as "stub" upon save, and then listing all stubs.

There are two indexes defined at the moment:
  UNIQUE KEY `pp_page_propname` (`pp_page`,`pp_propname`),
  UNIQUE KEY `pp_propname_page` (`pp_propname`,`pp_page`)

So, we can efficiently get all properties (or one specific property) for a page. And we can efficiently get all pages that *have* a specific property. We can however not get all pages with a specific *value* for a given property. For that, we would need an index over at least pp_propname and pp_value, but this doesn't work, since pp_value is a blob. So allow this, we'd need an id column. I propose to add a column:

  `pp_value_id` varbinary(255) NOT NULL
  
This may contain a hash of the value blob, or it may contain the actual value (with pp_value being
the same, or empty, or null). This would allow us to create an index for looking up pages by value:

  UNIQUE KEY `pp_propname_value` (`pp_propname`,`pp_value_id`, `pp_page`)

Perhaps the pp_propname_page could then be dropped, I don't really see a use case for it.
Comment 1 Daniel Kinzler 2013-12-05 11:51:45 UTC
Oh, while we are messing with the table, that would probably be a good time to introduce a surrogate primary key.
Comment 2 Daniel Kinzler 2013-12-05 12:06:26 UTC
We should also consider that the order of values may be relevant, not only equality. For numeric values, this would mean we'd need a padded version of the number in pp_value_id, so the alphanumeric order of the index can be used. 

Alternatively, we need separate columns or even tables for different types of values.
Comment 3 Daniel Kinzler 2014-02-18 12:53:28 UTC
After some discussion with Roan and Tim, it's seems the easiest approach is to add a new column to page_props, containing a float value to be used as a sort key. This would allow for exact matches as well as the typical "top k" queries often used on special pages. The column should be nullable so no value would be provided for page props that do not represent a quantity (such as a display title or redirect target). 

Querying for strings would not be possible this way, a partial index on the value blob could be used for that.
Comment 4 Gerrit Notification Bot 2014-03-31 11:05:46 UTC
Change 122349 had a related patch set uploaded by Daniel Kinzler:
(bug 58032) introducing pp_sortkey.

https://gerrit.wikimedia.org/r/122349
Comment 5 Daniel Kinzler 2014-03-31 11:09:25 UTC
Change I217c4265 introduces the pp_sortkey field, and automatically uses it to index any property values that are int, float or bool.

Pending:
* a maintenance script for populating pp_sortkey (this will be sloppy, because when reading pp_value from the database, we don't know the original type - the value will always be a string).
* an API module for querying pages by property value.
Comment 6 Gerrit Notification Bot 2014-04-24 05:26:47 UTC
Change 122349 had a related patch set uploaded by Aaron Schulz:
Introducing pp_sortkey.

https://gerrit.wikimedia.org/r/122349
Comment 7 Gerrit Notification Bot 2014-04-24 21:42:10 UTC
Change 122349 merged by jenkins-bot:
Introducing pp_sortkey.

https://gerrit.wikimedia.org/r/122349
Comment 8 tobias.gritschacher 2014-05-06 09:47:14 UTC
Created bug 64949 as a follow-up.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links