Last modified: 2014-02-12 23:35:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1497 - Hierarchical category system is urgently needed
Hierarchical category system is urgently needed
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
unspecified
All All
: Normal enhancement with 2 votes (vote)
: ---
Assigned To: Brion Vibber
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-08 18:41 UTC by Matthias Kleine
Modified: 2014-02-12 23:35 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Matthias Kleine 2005-02-08 18:41:48 UTC
1. Categories in wikipedia are chaos.
2. The reason is: The system does not work hierarchically.
3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories 
"mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the 
article. This is not the case in the current system. The result is chaos.
4. Much work is now spent to solve the chaos in the current category system. Much work could be saved if 
there would be a sound technical foundation for a _true_ category system.
5. I discussed this issue with several engaged wikipedia authors and administrators in the german 
wikipedia. They all agree that this would be a desirable issue.

Best regards
Matthias Kleine
Comment 1 Brion Vibber 2005-02-09 02:31:04 UTC
Discussed this with Matthias a bit on IRC, will implement his plans once we've got a firmer idea how best to do this.
Comment 2 Peter Gervai (grin) 2005-02-16 21:31:13 UTC
Please see http://meta.wikimedia.org/wiki/Category_flatten

I am no php coder, but I think it's not really tough to get that done. Opinions are most welcome. And yes, we badly 
need that.
Comment 3 Brion Vibber 2005-02-16 21:36:04 UTC
The hard part is not proposing a flattened membership table to speed reads, 
but rather implementing it efficiently. Not only reads, but writes must be taken 
into account; if a major category hierarchy is rearranged (and this can be done 
with a simple edit to a single page), thus must be handled without killing the 
wiki for an hour rewriting the flattened membership table.
Comment 4 Matthias Kleine 2005-02-16 22:57:26 UTC
Anybody who is interested in finding an efficient solution for this problem may also take a look at 

http://en.wikipedia.org/wiki/User_talk:Brion_VIBBER#Categories

Regards, Matthias
Comment 5 Richard J. Holton 2005-02-17 00:11:10 UTC
Does this presume a change to a tree-structure for categories? If not it seems
like you could end up with a situation where adding an article to a category
could add that article to virtually every category on the system. Is this what
we want?

Or, if we are talking about a true tree-structured category system, is ''that''
what we want? It would be a significant change to the current system's behavior.
Comment 6 Matthias Kleine 2005-02-17 00:46:03 UTC
Its just like categories in human mind work. All kinds of cognitive science support that view of categories: Let it 
begin with Piagets studies of cognitive development, take a look at modern cognitive psychology, look how the studies of 
artificial intelligence deal with the problem ... categories are structured treelike, not listlike. Look at how 
scientific areas are structured. How are the books in your library sorted (I hope kind of different then articles in 
wikpipedia ...). Its just a natural way of dealing with issues, saying "this issue belongs to this broader issue, and 
this broader issue itself belongs to a more general issue ...".

I admit that this would not be a minor change in how things are done in wikipedia. Therefore, I appreciate the 
discussion. We should be aware that even when we keep the category system as a list, like it is now, users will continue 
to handle it like a tree, not knowing that the system will behave different than they think of it. 

Did you ever observe people creating very special categories like [[categorie:mysmallhometown]] and changing the links 
in dozens of articles? Its only a question of time until somebody even more weird will create [[categorie:
mysmallhometown (westside)]], changing the links again, so that [[categorie:mysmallhometown]] will lose quite many of 
its articles. In fact, this is what happens every day in the current system ...

Regards Matthias Kleine
Comment 7 Joern Schimmelpfeng 2005-02-19 23:18:56 UTC
Ever thought about creating an Ontoloy?

One major problem I see is to detect if a subcategory or an articel in a category belongs to a 
toplevel category. For example you have toplevel category "A" and "B" and you have subcategories "A1" 
and "B1" as well as "AB". Lets assume "AB" is subcategory of "A" and "B" and there is a looseley 
containment relationship from "AB" to "B1".

A       B
| \   / |
|  AB   |
|    \  |
A1     B1

So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. Within 
articles the problem is much more worth, because we sometimes have very looseley relationships there. 
Examples for that are "Computer Science", "Social Science" and "Computers and Society".

So one idea is to create a Ontoloy. This means that relationships are semantically well defined. (Eg. 
containment relationship, similarity relationship, "is part of" relationship,...). So you are able to 
"understand" what kind of relationships two categories or articles have - if it is strong or just 
informational.

We could adopt the Semantic Web approach (RDF/OWL) for that. I don't think that we should use it 
directly because of the complexity of RDF. 

What do you think?(In reply to comment #0)
> 1. Categories in wikipedia are chaos.
> 2. The reason is: The system does not work hierarchically.
> 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to 
categories 
> "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find 
the 
> article. This is not the case in the current system. The result is chaos.
> 4. Much work is now spent to solve the chaos in the current category system. Much work could be 
saved if 
> there would be a sound technical foundation for a _true_ category system.
> 5. I discussed this issue with several engaged wikipedia authors and administrators in the german 
> wikipedia. They all agree that this would be a desirable issue.
> 
> Best regards
> Matthias Kleine

Comment 8 Matthias Kleine 2005-02-19 23:34:28 UTC
> A       B
> | \   / |
> |  AB   |
> |    \  |
> A1     B1
> 
> So logically this means "B1" is subcategory of "A". But semantically it is not neccessarily. 

In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user "edits" the category 
tree in a way that semantically is nonsense (i.e. classifying a car as animal or something). Surely enough, there are a 
couple of models for knowledge represantation, which might be even better than a category tree (in my opinion, Minsky's 
frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is too far to achieve. A 
simple tree would be three steps forward and might be realizable in quite a foreseeable time.
Comment 9 Joern Schimmelpfeng 2005-02-20 10:37:56 UTC
 
> In my eyes, this is clearly a problem of the user level. No architecture will prevent that a user 
"edits" the category 
> tree in a way that semantically is nonsense (i.e. classifying a car as animal or something).

The point is, that I don't believe it is allways nonsens. There are good reasons why a category may  
belong to multiple toplevel categories. But there are different types of relationships, that you 
cannot model today.

> Surely enough, there are a 
> couple of models for knowledge represantation, which might be even better than a category tree (in 
my opinion, Minsky's 
> frame logic would be quite fine, but this relies on a tree structure, too). However, this aim is 
too far to achieve. A 
> simple tree would be three steps forward and might be realizable in quite a foreseeable time.

I think to give a relationship a semantic definition is not hard to implement and not too confusing 
to use. Just two differnt types of relationships (isPartOf and isRelatedTo) would help a lot. One of 
them must be strictly hirarchichally the other one is a graph. This allows to automatically classify 
articels and categories. One interessing usecase for instance is, to use a cluster-algorithm to 
detect if a category makes sense at all or you should split it.  

Is there a way to disucss offline?
Comment 10 Peter Gervai (grin) 2005-02-21 17:16:30 UTC
I do think that discussions probably should not run in bugzilla. Why not move it to http://meta.wikimedia.org/w/index.
php?title=Talk:Category_flatten ?
Comment 11 Antoine "hashar" Musso (WMF) 2005-03-27 20:40:49 UTC
Continue discussion on meta: :
http://meta.wikimedia.org/wiki/Category_flatten

Closing as later.
Comment 12 Helder 2010-11-20 12:47:38 UTC
My apologies for not understanding, but why was this changed from LATER to FIXED?

Does MediaWiki currently have lists of "pages in a category and it's subcategories"? How can that be used? Specifically: how was fixed the problem exemplified in item (3) of comment #0 ?
(In reply to comment #0)
> 3. Example: When I add an article to category "Cat", it should also _automatically_ belong to categories 
> "mammal", "animal" and "creature". When I now browse through the categorie "animal", I will find the 
> article. This is not the case in the current system. The result is chaos.
Comment 13 Bawolff (Brian Wolff) 2011-04-04 21:41:39 UTC
Good question. re-opened.
Comment 14 Brett Zamir 2014-01-08 01:58:05 UTC
This would be very compelling for Special:RandomInCategory as one could essentially get the same enjoyable variety one gets within one's favorite television or radio station, getting say exposed to new Science articles without having to specify exactly which field one was interested in.

(Speaking of radio, it would be interesting if one could ask for random sound files in a category, and get the pages to load, play, and then load another random one in sequence; likewise for videos; scrolling through random images in a category ala Google Images would be cool too.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links