Last modified: 2014-07-10 15:35:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41150, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39150 - Create Special:EmptyItems
Create Special:EmptyItems
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal enhancement with 2 votes (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=backend p=0
: need-volunteer
Depends on: 40157 58032
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-08 13:32 UTC by denny vrandecic
Modified: 2014-07-10 15:35 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description denny vrandecic 2012-08-08 13:32:10 UTC
Create a special page that lists all pages with an empty item (i.e. an item without no content whatsoever).
Comment 1 denny vrandecic 2012-08-09 09:57:55 UTC
Requires a way to actually query for this information.
Comment 2 Thomas Douillard 2012-09-10 19:27:11 UTC
I'm working on this issue (to start with something small), and I can see two possibilities for an empty item definition :

* An empty item is an item with just a Label or a description, no Alias : just one row in the term table
* An empty item is an item not linked with anything (currently no Wikipage) 

Which one should it be, one of these two above, or another one ?
Comment 3 Daniel Kinzler 2012-09-11 08:00:22 UTC
I would thing that an empty item is empty. It contains nothing. No label, no description in any language, nothing. I don't think we should consider an item that has a label as empty. Programatically, this is defined by ItemContent::isEmpty() - but there's currently no way to query that in the database. Eventually, page_site should be usable for this, but currently, the number there is not very meaningful, and never null. I think for empty items it's currently between 2 and 14 or something (it counts bytes in the item's serialized form).

Anyway.

Re your first definition: does that mean it has a label or a description in only one language? Or in any number of languages? Note that there will be quite a few items that just have a label and a description, often only in one language. I would consider these "stubs", not "empty". They will often be created when filling in a property of the type "item reference", when the respective target item doesn't exist yet: if I want to provide the mayor of a city, but the mayor has no Wikidata item yet (and no Wikipedia page in any language), I'd just give a label and description, and the system would create a stub item. 

Re your second definition: there will be quite a few items with no Wikipedia links that cant have Wikiepdia links because there just isn't any Wikipedia page about them. Stubs like the above, but also books needed for citations, etc. They should not be considered empty. 

All that being said, I think it would be useful to have lists for all of these: empty items, "stub" items (label and/or description and/or aliases only), and "unconnected" items (items with no sitelinks).

It's currently not trivial to get these lists from the database, and I imagine we might get even more involved definitions once we have full support for properties (and maybe things like categories). One solution would be to detect these "states" of the item whenever a new revision is saved, and store them to the page_props table.

Yea, thinking about it, that's probably the way to go. ItemContent (or better, EntityContent) should get a getPageProps() method that would be used to push the appropriate page_props into the ParserOutput object returned by EntityContent::getParserOutput. That should cause these props to be saved in the DB, which makes it easy to construct the respective lists.
Comment 4 Daniel Kinzler 2012-09-11 08:09:39 UTC
I have filed bug 40157 for the work required to get the necessary info into page_props, so we have separate tickets for the feature and the underlying mechanism. 

@Thomas: want to have a go at 40157?
Comment 5 jeblad 2012-09-11 09:03:05 UTC
I wonder if we need a writeup for what different things like "empty", "stub", "unlinked", etc, means in our context. I guess its somewhat confusing that an item can be empty and still contain stuff.
Comment 6 Daniel Kinzler 2012-09-11 09:07:50 UTC
(In reply to comment #5)
> I wonder if we need a writeup for what different things like "empty", "stub",
> "unlinked", etc, means in our context. I guess its somewhat confusing that an
> item can be empty and still contain stuff.

I tried to define the relevant things in bug 40157.

But... in my mind, "empty" means *empty*. It's not empty if it contains stuff. But the serialization may still contain empty arrays for labels, etc, so the size of the serialized form is not useful.
Comment 7 jeblad 2012-09-11 10:37:19 UTC
Stuff in this context is things like '{}' and '{"labels":{}}'.
Comment 8 Daniel Kinzler 2012-09-11 15:18:53 UTC
(In reply to comment #7)
> Stuff in this context is things like '{}' and '{"labels":{}}'.

EntityObject::isEmpty() implements the check.
Comment 9 Thomas Douillard 2012-09-11 18:27:50 UTC
(In reply to comment #4)
> I have filed bug 40157 for the work required to get the necessary info into
> page_props, so we have separate tickets for the feature and the underlying
> mechanism. 
> 
> @Thomas: want to have a go at 40157?

This bug is the answer to my unasked next question, so yes I will take this one.
Comment 10 Daniel Kinzler 2012-09-11 18:39:50 UTC
(In reply to comment #9)
> (In reply to comment #4)
> > @Thomas: want to have a go at 40157?
> 
> This bug is the answer to my unasked next question, so yes I will take this
> one.

Excellent, thank you!

If you have any questions, just post it to the bug or ask me directly. Or send mail to wikidata-l.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links