Last modified: 2014-11-18 15:20:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T58711, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 56711 - Update the wb_terms table so it does not have a numeric entity id


Summary:	Update the wb_terms table so it does not have a numeric entity id

Status:	PATCH_TO_REVIEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	WikidataRepo (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High normal (vote)
Target Milestone:	---
Assigned To:	Wikidata bugs

URL:
Whiteboard:	termsearch u=dev c=backend p=20
Keywords:

Depends on:	68378
Blocks:	64288 73496
	Show dependency tree / graph

Reported:	2013-11-07 09:55 UTC by tobias.gritschacher
Modified:	2014-11-18 15:20 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description tobias.gritschacher 2013-11-07 09:55:27 UTC

* Update term class to not have a numeric entity id
* Provide a migration script for wb_terms

Comment 1 Daniel Kinzler 2014-01-14 10:51:20 UTC

Rationale: we are dropping the assumption that ids will always be prefix+number. For the current code and use case, wikidata.org, this works fine, but we need to migrate away from this in order to support things like meta-data storage on commons.

Comment 2 tobias.gritschacher 2014-01-22 10:25:19 UTC

https://gerrit.wikimedia.org/r/#/c/101197/

Comment 3 Jeroen De Dauw 2014-02-20 17:38:24 UTC

I fixed compat with sqlite and several other issues. The tests now pass: https://gerrit.wikimedia.org/r/#/c/114490/

The commits are doing some stuff I don't like, though that can be fixed after we got rid of the main issue, the bad assumption in the table, which this commit fixes.

Comment 4 Thiemo Mättig 2014-03-25 13:33:24 UTC

Springle wrote at https://gerrit.wikimedia.org/r/#/c/101197/

This one still seems dangerous to me :-) I understand the reason for the change, however please do also consider:

1. Have we done any real profiling of the new query forms against the production dataset? I'd really like to see how much of an impact this has on data and index disk usage, and more importantly on runtime memory usage. Happy to do this if a Dev can generate a few thousand samples of each query type...

2. Would it be wise to keep a numeric entity id field as an interim step on the wikidata production dataset, so we can fail back if needs be? Ie, treat this as a denormalization step (which is /all it is/ for now) until #1 is assured? That might even make the migration less painful.

3. VARBINARY(255) smells like an arbitrary size choice :-) Variable field widths really start to matter for large datasets as the server must convert it to fixed-width BINARY while working. If the choice /was/ arbitrary, can we arbitrarily choose to make this smaller from the get go?

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links