Last modified: 2014-01-30 02:52:35 UTC
The actual system of the categories is not adapted for Wikipedia. Wikipedia is too big. The search Engine is not very good. We need a good classement system. The categories system has a big defect : We can't cross it. An exemple : Victor Hugo. The article [[fr:Victor Hugo]] has lot of cats : 1802 births | 1885 deaths | French dramatists and playwrights | French poets | French novelists | French-language poets | Members of the Académie française | Romantic poets | Novelists | Spiritualists | Former Students of Lycée Louis-le-Grand | Natives of Besançon | People buried at the Panthéon Well, now, I can know all personns "Born in 1802", "French Poets", etc. (I go on [[Cat:1802 births]], [[Cat:French poets]]. But I can't know "French Poets" "born in 1802". I will be able to know if [[Cat:French poets]] is IN [[Cat:1802 births]]. So, here is my new Idea : We add Tags on the pages. i.e. : %%Men%%, %%Born in 1878%%, %%Poetry%% We can navigate and search articles with some tags, with exclusive requests (or not) : - 'Men Writer Writer AND Poetry' - 'Men Writer or Poetry' - 'Men Writer Xor Poetry', etc. I don't know the limts of the system. I think add "Recommanded Tags" would be good. You will be able to surf as I search a Town, Lille, in France, in Nord country. "Well, in the Tag France, I search a Town" -> I get all french towns. I search French Towns in Nord -> I see Lille !" What do you think about that ?
Categories in MediaWiki are tags. That's the same thing.
(In reply to comment #1) > Categories in MediaWiki are tags. That's the same thing. Not exactly ... In fact, categories are good. But I want to add a new feature : search with the categories. I want search i.e. "Poets" AND "Writer". Now, I can navigate in [[Category:Writers]]. I can go from it in [[Category:Writers by language]] . The, in [[Category:Yiddish writers]] . Then, in [[Category:Yiddish-language poets]] . I works. But I don't want that. I want to have a multi possibility research. I want search a "Yeddish man" and an "Astronomer". Categories don't allow that. I want just the possibility to *cross the categories*. Make cross categories search with logic operator (Or, Xorg, And). Just that. If my english is too bad, contact me :)
Have a look on "CatScan" to see how it could work (currently there is a search restriction to 1.000 articles, so you have to play around a bit): http://tools.wikimedia.de/~daniel/WikiSense/CategoryIntersect.php?wikilang=en&basecat=French+poets&tagcat=Writers&userlang=en
Changing summary to reflect the real request. 95% sure this is a dupe.
*** Bug 3972 has been marked as a duplicate of this bug. ***
What would be REALLY useful is to have ways to indicate in a category what are the criterias you most likely want to combine the current category with. For example, http://en.wikipedia.org/wiki/Category:Scientists would contain <search name="Speciality"> * Astronomers * Biologists * Chemists * Cognitive scientists * Computer scientists * Cyberneticists * Earth scientists * Fictional scientists * Game theorists * Gerontologists ..... </search> <search name="Nationality"> * French * English * German * US </search> <search name="Gender"> Male Female </search> <serach name="Born in"> * XVII century * XVIII century * XIV century * XX century </search> The first search element is displayed like the subcategories are now. The others are options which allows to combine [[Category:Scientists]] with [[Category:Born in the XX century]]
have a look at <a href="http://tools.wikimedia.de/~voj/cgi-bin/cocat">CoCat</a> - maybe <a href="http://tools.wikimedia.de/~voj/cgi-bin/cocat?cat=French+poets&dbname=enwiki_p">this</a> is what you are looking for?
*** Bug 5740 has been marked as a duplicate of this bug. ***
I kinda agree with chtitux's general idea (having simple categories and a "multi-criteria-search" in them), because the current category thing is becoming more and more heavy to maintain. (..but I didn't quite undertsand what JM Fayard was trying to explain, it seems like a clearly different thing ^^;)
*** Bug 1106 has been marked as a duplicate of this bug. ***
For a way to search for articles by "properties", have a look at [[meta:Semantic_MediaWiki]]
See also [[metawikipedia:Improving categories|Improving categories]] (many supporters)
[http://meta.wikimedia.org/wiki/Improving_categories Improving categories] (a "preview" should be great ;)
*** Bug 6127 has been marked as a duplicate of this bug. ***
Here is what I put in Bug 6127: I'd like to request a cross referencing search for categories applying simple boolean logic. Examples: The intersection of Category:A and Category:B (all items in common) The exclusion of Category:A and Category:B (all items not in common) The union of Category:A and Category:B (all items) And I'd like to request some sort of wikilink namespace for doing this, such as [[intersect:A:B]] and support for more than 2 categories to be searched. And an option to include subcategories and specify how deep to go.
Tweaked summary.
See [[en:Wikipedia:Category math feature]] , [[meta:User:Aerik/Intersections_code]], [[meta:Category math]], [[meta:Talk:Category math]] etc. A little demo is available here : http://www.wikidweb.com/index.php?title=Special:Intersections
*** Bug 6499 has been marked as a duplicate of this bug. ***
6499 seems to be asking for something a bit different but I may have misinterpreted it.
Nope, it's the same. "Consider a company intranet that includes these categories: * Person * Project * Department * Department:Admin * Department:Sales * Department:Dev Wouldn't it be great to be able to list all the projects in Sales, or all the persons in Admin, without sacrificing the ability to list either all projects, or all persons, or all pages under Sales, as the subcategory would?" "All the projects in Sales" = "the intersection of Category:Project and Category:Department:Sales". All the persons in Admin = "the intersection of Category:Person and Category:Department:Admin". See [[Intersection (set theory)]].
No, that one seems to be referring to GUI stuff, not just the intersection capability, which I agree is the same, but again, I may have misread it... thanks for the reminder about intersection theory though.
I'm not sure what you're saying. Obviously this feature is going to have to have a GUI, however it's implemented.
6499 speaks of: "A special kind of category page might be the simplest way, but the most useful would be a plugin for inling the results o such a query." which seems different, as it refers to plugins and so forth, while this one does not. The underlying set manipulation to make it work would be the same. Perhaps that one is an enhancement of this one? Or close enough to a dup not to matter for now? Maybe this is a moot point?
I registred 6499, somehow managing to not find this one in my searches. As has been said, the underying boolean search is the same. The only significant difference is that 6499 is concerned with stored searches, not interactive ones. It would of course make no sense to not add the same feature to user searches at the same time, but stored searches are much more empowering, and useful for building/struturing the content structure of a wiki in a way compatible with the Wiki way of thought. The most visible instance of stored searches in mediawiki is precisely category pages, which even with their very limited searching functionality add imensely to the power and usabilty. But I suppose intersections (etc.) and stored searches are really separate features, just with a really useful crossing point, and should be treated as such.
Several of us are working on a proposal at [[en:Wikipedia:Category intersection]]. It envisions how this would work, has mock-ups for category intersection pages, and includes an interface for requesting intersections from any article page. There seems to be a technical question as to whether this will be too much of a burden on the servers. What is required to implement this?
Most likely: Tim deciding to write it. Could be one of a couple of other people too, but I'd hazard a guess that Tim is most likely, in the fullness of time. Any patch submitted would have to be looked over by someone like Tim, Brion, or Domas anyway to decide whether the load increase is acceptable, and I suspect it would require a lot of skill to write it that well even if you could get one of them to look it over, so that route is unlikely. If you want this done, well, get everyone to vote for the bug and hope someone appropriate notices one day. (Cf.: bug 164, bug 708, bug 57.)
See also Brion's comment at http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/26692/focus=26714
I have written the following PHP script that allows for category intersections. Syntax is "Special:Intersect/Swedish people/Saxophone players" Output is similar to that of Special:Shortpages etc. This has been tested on my local wiki. It includes a simple GUI to find related intersections. I do not have SubVersion access but maybe one of the developers could take a look at this? Regarding database access, it does a SELECT from wi_pages with an INNER_JOIN for each category to be intersected. If necessary, it is easy to restrict this to a maximum of, say, five categories. The related-intersections GUI does two additional queries per category, to find parent and child categories. (however, this GUI can be disabled in config). [[[FILE:includes/SpecialIntersect.php]]] <?php /** * * @package MediaWiki * @subpackage SpecialPage */ /** * * @package MediaWiki * @subpackage SpecialPage */ // The following line should be added to SpecialPage.php: // 'Intersect' => array( 'SpecialPage', 'Intersect' ), // The following line should be added to QueryPage.php: // array( 'IntersectPage', 'Intersect' ), // The following line should be added to Messages.php: // 'intersect' => 'Category intersection', // The following line should be added to LocalProperties.php: // $wgIntersectVerbose = true; // (if need be, this can be changed to 'false' to reduce database load by removing part of the interface) // The following MediaWiki pages should be created: // MediaWiki:Intersectheader "For help, go to [[Help:Intersect]]. Intersecting these categories:" // MediaWiki:Intersecterror "You need to specify at least two categories." // MediaWiki:Intersectsubmit "Intersection" // MediaWiki:Intersectparent "Parent categories" // MediaWiki:Intersectsubcat "Subcategories" class IntersectPage extends PageQueryPage { var $request, $par; function IntersectPage( &$request, $cats ) { global $wgUser; $this->request =& $request; $this->skin =& $wgUser->getSkin(); $this->cats = $cats; } function getName() { return "Intersect"; } function sortDescending() { return false; } function isExpensive() { return true; } function isSyndicated() { return false; } function getSQL() { $dbr =& wfGetDB( DB_SLAVE ); extract( $dbr->tableNames( 'page', 'categorylinks' ) ); $sql = "SELECT 'Intersect' AS type, page_namespace AS namespace, page_title AS title, page_title AS value FROM $page "; $i = 0; $where = null; // TODO: maybe use /=talk/ (etc) to search a namespace foreach ($this->cats as $cat) if ($cat != null) { // functionality for 'exclude this cat'; it works but doesn't fit the GUI well /*if ($cat{0} == '-') { $sql .= "LEFT OUTER JOIN $categorylinks AS c".$i; $sql .= " ON page_id=c" .$i. ".cl_from"; $sql .= " AND c" .$i. ".cl_to = '" . substr ($cat, 1) . "' "; if ($where != null) $where .= " AND "; $where .= "c" .$i. ".cl_to IS NULL "; } else */ { $sql .= "INNER JOIN $categorylinks AS c".$i; $sql .= " ON page_id=c" .$i. ".cl_from"; $sql .= " AND c" .$i. ".cl_to = '" .$cat. "' "; } $i ++; // if ($i >= 5) break; // reduce DB load by using a maximum of five categories } if ($where != null) { $sql .= " WHERE " .$where; } return $sql; } } /** * Constructor */ function wfSpecialIntersect($par = NULL) { list( $limit, $offset ) = wfCheckLimits(); global $wgRequest; global $wgOut; global $HTTP_GET_VARS; global $wgIntersectVerbose; //R/ take information from HTTP get, or from parameter if ($HTTP_GET_VARS["cats"] != null) { $cats = $HTTP_GET_VARS["cats"]; } else if ($par != null) { $cats = explode ("/", rtrim ($par, '/')); } // display header and list of categories $count = 0; if ($wgIntersectVerbose != true) { // terse mode - a short list of categories $s = (wfMsg( 'intersectheader' )); if ($cats != null) { foreach ($cats as $cat) if ($cat != null) { $count ++; $s .= $i; $i = "; "; $s .= '[[:' . Title::makeTitle (NS_CATEGORY, ltrim ($cat, '-'))->getPrefixedText() . ']]'; } } $wgOut->addWikiText ($s . "."); } else { // verbose mode - browser with pulldown menus $wgOut->addWikiText (wfMsg( 'intersectheader' )); $dbr =& wfGetDB( DB_SLAVE ); $fname = "Special:Intersect"; extract( $dbr->tableNames( 'page', 'categorylinks' ) ); $s = "<form action='" . Title::makeTitle( NS_SPECIAL, 'Intersect' )->getLocalUrl() . "' method='get'>"; if ($cats != null) { foreach ($cats as $cat) if ($cat != null) { $count ++; $s .= "\n<select name='cats[]'>"; $parent = $dbr->select( array ('page', 'categorylinks'), 'cl_to', "page_title = '" . $cat . "' AND page_id = cl_from AND page_namespace = " . NS_CATEGORY, $fname ); if ( $parent != null ) { while ($row = $dbr->fetchRow($parent)) { $t .= "\n<option>" . $row[0]; } if ($t != null) { $s .= "\n<optgroup label='" . wfMsg( 'intersectparent' ) . "'>" . $t . "\n</optgroup>"; } } $s .= "\n<option selected>" . $cat . "\n<option>"; $child = $dbr->select( array ('page', 'categorylinks'), 'page_title', "cl_to = '" . $cat . "' AND page_id = cl_from AND page_namespace = " . NS_CATEGORY, $fname ); if ( $child != null ) { while ($row = $dbr->fetchRow($child)) { $u .= "\n<option>" . $row[0]; } if ($u != null) { $s .= "\n<optgroup label='" . wfMsg( 'intersectsubcat' ) . "'>" . $u . "\n</optgroup>"; } } $s .= "\n</select> "; } } if ($count == 0) { $s .= "<input type='text' name='cats[]'> "; } $s .= "<input type='text' name='cats[]'>"; $s .= "<input type='submit' value='" . wfMsg( 'intersectsubmit' ) . "'></form>"; $wgOut->addHtml ($s); } // intersect query is only meaningful if we have at least two cats if ($count < 2) { $wgOut->addWikiText (wfMsg ( 'intersecterror' )); return null; } else { $lpp = new IntersectPage($wgRequest, $cats); return $lpp->doQuery( $offset, $limit ); } } ?>
Created attachment 2372 [details] Source code for Special:Intersect In the zipfile, source code for a specialpage that does intersection, and some bits of GUI to work that page, and a tweak that makes category redirects work at the software level.
This comment is probably redundant given the lengthy discussion above, but here it is anyway... My idea of nice system would be to slightly modify the 'category box' (the little blue box that appears at the bottom of a categorized page and lists all the categories that the page is in). If that box additionally had a search field, searches typed in that field should be limited to that set of categories. i.e. My page about this particular dog... +-----------------------------------------------------------------------------+ | Category: Pages about dogs | Category: Pages by idiots | _________ : SEARCH | +-----------------------------------------------------------------------------+ Typing a term in the search field above (please don't actually try it) and clicking search would limit your search to pages about dogs written by idiots. I know it is a trivial suggestion given the above generic ramblings, but it would be a nice, clear way to illustrate and deploy the cross classification search. I don't know if this has been suggested before, but you should do it anyway.
*** Bug 11851 has been marked as a duplicate of this bug. ***
Comment on attachment 2372 [details] Source code for Special:Intersect Fix mime type.
*** Bug 14186 has been marked as a duplicate of this bug. ***
*** Bug 18118 has been marked as a duplicate of this bug. ***
I'm not sure how to do this but once we have patrolled revisions on en.wiki it would be nice if we could find pages under Special:Unpatrolled Revisions and Category:BLP
*** Bug 19201 has been marked as a duplicate of this bug. ***
Note Bug 21395 is somewhat a duplicate of this bug, but not quite.
Soon it will be half a dozen duplicates of this feature request that have been raised. Can we act on this sooner?
There has already been nine duplicates, not including the somewhat duplicate.
Yes, I meant to say a dozen, rather than half. Point made anway.
(In reply to comment #38) > Soon it will be half a dozen duplicates of this feature request that have been > raised. Can we act on this sooner? Having many duplicates doesn't change the fact its a difficult problem to solve efficiently. As an aside, there is some limited support for categoryintersection via the various toolserver scripts - http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php . Additionally, the search box can kind of do category intersection if your search for incategory:first_category incategory:second_category. See [[Help:Searching]]. Both these are still far from what is being requested, but they may be helpful for your particular needs. In regards to comment 28 - the issue is not making category intersection code - that is relatively easy and already exists in various extensions, like DynamicPageList. The issue is making it efficient for something on the scale of Wikipedia.
So, you can use that search trick to do it for intersections, but it can't be done efficiently? Can't you just make it do whatever that search trick does?
Yes, one likely way to do this would be to use fulltext indexes like for search. However, someone still has to actually write the code, and that someone must be familiar with the performance issues involved (which makes the pool of possible patch-writers much smaller than for a typical bug).
(In reply to comment #43) > Yes, one likely way to do this would be to use fulltext indexes like for > search. However, someone still has to actually write the code, and that > someone must be familiar with the performance issues involved (which makes the > pool of possible patch-writers much smaller than for a typical bug). Extension:AdvancedSearch is a proof of concept for using fulltext indexes for category intersection that I wrote back in 2008. Someone had hired me to write it, and I never got around to committing it to SVN until the annual category intersection thread on wikitech-l started. I was encouraged to commit my code so others could adapt it to something more generic, but it was never touched apart from a few trivial changes.
*** Bug 29243 has been marked as a duplicate of this bug. ***
This sort of works now with two "incategory" parameters http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=incategory%3A%22Australian+beach+volleyball+players%22+incategory%3A%22People+from+Adelaide%22 What is missing is the ability to specify that we want to search in a category, and all subcategories. That is a performance problem, especially because the category structure does not need to be a hierarchical tree.
(In reply to comment #47) > This sort of works now with two "incategory" parameters > > http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=incategory%3A%22Australian+beach+volleyball+players%22+incategory%3A%22People+from+Adelaide%22 > > What is missing is the ability to specify that we want to search in a category, > and all subcategories. That is a performance problem, especially because the > category structure does not need to be a hierarchical tree. It doesn't seem to work on it.wiki.
I know that the E3 team started using Redis in a limited capacity to implement efficient 'random article from category' (can not find link right now, will update when I do). Perhaps we can use Redis / similar non-mysql database for things like this?
I am not clear as to whether any of the proposed solutions would work on categories not hard-coded in the page, ie implement in non-substed templates. Which is to say should English Wiktionary have a bot create hard-coded categories for all important template-created categories so that available category intersection tools can be used?
(In reply to comment #48) > (In reply to comment #47) > > This sort of works now with two "incategory" parameters > > > > http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=incategory%3A%22Australian+beach+volleyball+players%22+incategory%3A%22People+from+Adelaide%22 > > > > What is missing is the ability to specify that we want to search in a category, > > and all subcategories. That is a performance problem, especially because the > > category structure does not need to be a hierarchical tree. > > It doesn't seem to work on it.wiki. It works for me on itwiki: https://it.wikipedia.org/w/index.php?search=incategory%3AGroenlandia+incategory%3A%22Isole+della+Danimarca%22+incategory%3ARecord&button=&title=Speciale%3ARicerca
(In reply to comment #51) > (In reply to comment #48) > > (In reply to comment #47) > > > This sort of works now with two "incategory" parameters > > > > > > http://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=incategory%3A%22Australian+beach+volleyball+players%22+incategory%3A%22People+from+Adelaide%22 > > > > > > What is missing is the ability to specify that we want to search in a category, > > > and all subcategories. That is a performance problem, especially because the > > > category structure does not need to be a hierarchical tree. > > > > It doesn't seem to work on it.wiki. > > It works for me on itwiki: > > https://it.wikipedia.org/w/index. > php?search=incategory%3AGroenlandia+incategory%3A%22Isole+della+Danimarca%22+ > incategory%3ARecord&button=&title=Speciale%3ARicerca The new search backend is supposed to bring this out of the realm of sort of working into the realm of always working.
(In reply to comment #52) > The new search backend is supposed to bring this out of the realm of sort of > working into the realm of always working. Yeah, it.wiki is even voting CirrusSearch pioneering right now, with category intersection as one of the motives. [[it:Wikipedia:Bar/Discussioni/Lanciamoci nel futuro motore di ricerca interno]]
(In reply to comment #50) > I am not clear as to whether any of the proposed solutions would work on > categories not hard-coded in the page, ie implement in non-substed > templates. incategory: should work on categories added using templates. bug 18861 is the closest I can find. > Which is to say should English Wiktionary have a bot create hard-coded > categories for all important template-created categories so that available > category intersection tools can be used? I hope not. (In reply to comment #47) > What is missing is the ability to specify that we want to search in a > category, > and all subcategories. That is a performance problem, especially because the > category structure does not need to be a hierarchical tree. this is bug 35402. Does CirrusSearch address either of those problems?
It addresses the categories added via templates. It does not address bug 35402. To be blunt, I don't expect bug 35402 to be fixed anytime soon. The infinite depth version I don't expect to really be fixed ever.
Yurik has a patch related to this bug at https://gerrit.wikimedia.org/r/#/c/109853/ > To be blunt, I don't expect bug 35402 to be fixed anytime soon. The infinite > depth version I don't expect to really be fixed ever. Oh look I'm wrong (sort of). Dschwen has implemented this as a stand-alone c deamon. See https://commons.wikimedia.org/wiki/Help:FastCCI