Last modified: 2008-06-08 16:33:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 10087 - Relations table and excessive size
Relations table and excessive size
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Semantic MediaWiki (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://sourceforge.net/mailarchive/fo...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-31 14:23 UTC by Sergey Chernyshev
Modified: 2008-06-08 16:33 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sergey Chernyshev 2007-05-31 14:23:59 UTC
smw_relations table grows very fast. It might be good idea to consider stripping it of subject_namespace and subject_title or even merging it with MediaWiki pagelinks table which apart from relation_title stores all the same information (since relations are defined through links syntax).

For statistics - one of my test installations with about 30K (relation intensive) pages has approximately 5mil entries in smw_relations (taking approximately 1GB of space) table and a little more entries in pagelinks table (taking approximately 850MB of space).

Maillist discussion thread (my posts only, so far) can be seen here:
http://sourceforge.net/mailarchive/forum.php?thread_name=9984a7a70705281206r58d041f9i7c393a82447f1336%40mail.gmail.com&forum_name=semediawiki-devel
Comment 1 Markus Krötzsch 2007-06-04 11:51:13 UTC
I understand the problem and will consider ways of providing storage-optimised data models in future releases. The additional title and namespace information is relevant for implementing filtering operations more effiently (i.e. without an additional join with the page table) and hence was denormalised on purpose. 

For relations and objects, ids are not always available and thus cannot replace the title strings in general. We could consider having an own indexing scheme for them or use ids at least for attributes (which need to have articles before being stored). However, ids in MediaWiki are not as persistent as the name: when you move a page, the id of the page of the original name changes. Hence all uses of ids require global updating operations whenever pages are moved and we also tried to limit this.

What you can do to save space (at the expense of performance) is to drop indexes which are currently built for the tables. Especially the ones over object and subject titles/namespaces are not required for all operations and deleting them might be feasible.
Comment 2 Markus Krötzsch 2008-06-08 16:33:55 UTC
The new storage engine of SMW uses internal numerical ids for all pages, so that title strings vanish from relation tables and subject positions of other tables completely. In compensation, there is a new SMW-specific id management (we cannot use MediaWiki ids since they do not exist for all objects occuring in SMW tables). Yet, for relation-heavy wikis, this should be a great reduction of storage space. The new storage implementation is in SVN and will also soon be released with proper update instructions. For testing it now, see the instructions in Bug 13960 (switching back to the old implementation is always possible, but you should run on current stable release SMW1.1.1 before trying out SVN.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links