Last modified: 2014-03-21 10:55:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40403, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38403 - Sortable search results
Sortable search results
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-14 19:26 UTC by Jan Ainali
Modified: 2014-03-21 10:55 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jan Ainali 2012-07-14 19:26:06 UTC
When getting the search results it is possible to see some data of the article. It would be great to be able to sort the article based on it, especially on data, but, size and alphabetical would also be useful.
Comment 1 Jan Ainali 2013-02-16 10:29:00 UTC
Oh, I meant especially on 'date'.
Comment 2 Andre Klapper 2013-03-26 11:19:53 UTC
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]
Comment 3 Dan Garry 2013-10-29 19:26:26 UTC
May be worth including in CirrusSearch.
Comment 4 Nik Everett 2013-11-04 21:40:56 UTC
Would need some product/design team involvement mostly because sorting on anything other than score is likely to give the user garbage results.  Better might be allowing the user to deem recently changed articles more important and give them a boost.  This will bring more recent articles before those that would otherwise better match.  The problem is, any other requested sort key would need similar design.
Comment 5 Daniel Naber 2014-03-15 12:36:59 UTC
(In reply to Nik Everett from comment #4)
> Would need some product/design team involvement mostly because sorting on
> anything other than score is likely to give the user garbage results. 

I don't see how sorting by date (i.e. last modified date) can give garbage results if the user explicitly chooses to sort by date. Similar for title and size. Technically, this should be easy to implement, as ElasticSearch should be able to sort on any field in the documents in its index.
Comment 6 Dan Garry 2014-03-17 20:46:10 UTC
What is the use case for wanting to order search results by last modified date? Knowing that would help us figure out what the best solution to that problem is.
Comment 7 Daniel Naber 2014-03-17 21:35:13 UTC
Use case: I'm the author of LanguageTool (a style and grammar checker) and I'd like to know how people use it for Wikipedia. As this information can be in any namespace I can only find it with a search. But there are 50 or so matches, and I run this search (just "languagetool") regularly. Thus without a sort by date, I have to walk through all the matches to find the new ones.
Comment 8 Dan Garry 2014-03-17 21:42:21 UTC
Thanks for the explanation, but I'm a little unclear on how you're using CirrusSearch to find individual edits. Could you provide me with the URL of the search query you use to do this?
Comment 9 Daniel Naber 2014-03-17 21:55:35 UTC
I don't - it would be great of course if that was possible. I'm using https://de.wikipedia.org/w/index.php?title=Spezial:Suche&search=languagetool&fulltext=Suche&profile=all&redirs=1 which is good enough, only that I'm too lazy to look through all the results as most are old. Having them sortable I'd only have to look at the first 2-5 or so results.
Comment 10 Jan Ainali 2014-03-17 22:14:44 UTC
Well, my use case was simpler. I am an engineer. I see search results with meta data. I want to sort it based on the meta data to see what pops up. It would be awesome to find outliers, such as pages referring to new stuff but with old dates or extremely large or small pages. It would also be great to create an alphabetical list based on a search for use on an edit-a-thon.
Comment 11 Dan Garry 2014-03-17 23:03:23 UTC
(In reply to Daniel Naber from comment #9)
> I don't - it would be great of course if that was possible. I'm using
> https://de.wikipedia.org/w/index.php?title=Spezial:
> Suche&search=languagetool&fulltext=Suche&profile=all&redirs=1 which is good
> enough, only that I'm too lazy to look through all the results as most are
> old. Having them sortable I'd only have to look at the first 2-5 or so
> results.

Search fundamentally can't satisfy your use case. Say, for example, someone added "LanguageTool" in a JS file in their user space five years ago, then made a copyedit to a comment yesterday; in this case, the search would show up with this as the most recent result.

A better thing for you to do would be to make the LanguageTool insert some comment into the edit summary of anyone who used it (e.g. "Copy edited using LanguageTool"), then query a database dump to find what you're looking for. [[WP:AWB]] does something similar to this.

(In reply to Jan Ainali from comment #10)
> Well, my use case was simpler. I am an engineer. I see search results with
> meta data. I want to sort it based on the meta data to see what pops up. It
> would be awesome to find outliers, such as pages referring to new stuff but
> with old dates or extremely large or small pages.

You've told me you want to sort by date because you want to sort by meta-data. That isn't a use case, because it doesn't actually help me understand the problem you're trying to solve.

> It would also be great to
> create an alphabetical list based on a search for use on an edit-a-thon.

Due to the way search works, for most queries this would generate a list of articles with nothing in common. For example, searching for [[Barack Obama]] and sorting alphabetically would generate nonsensical results such as [[Aaron McGruder]], an American cartoonist, followed shortly thereafter by [[Abbottabad]], a city in northeastern Pakistan.

In this case, browsing a category would be a better idea if you're looking for related articles. In fact, for editathons, I've personally found much more success by giving participants a list of articles where the topic is notable by definition but the articles don't exist. For example, a Fellow of the Royal Society is unquestionably notable (see point 3 in [[Wikipedia:Notability_(academics)#Criteria]]), yet many Fellows do not have articles.
Comment 12 Daniel Naber 2014-03-18 07:45:09 UTC
I know I might get false positives. But still, having 'sort by date' means I only have to look at the first few results instead of walking through a list of 50 matches. Searching a database dump doesn't seem like a viable alternative, considering how huge it is and how much time it would take to set this up.
Comment 13 Jan Ainali 2014-03-18 14:36:20 UTC
(In reply to Dan Garry from comment #11)> 
> You've told me you want to sort by date because you want to sort by
> meta-data. That isn't a use case, because it doesn't actually help me
> understand the problem you're trying to solve.

Yes, you are right about that is not the trouble I am trying to solve. My trouble is that I become very frustrated when I see that the search results are presented in a way that I do not expect on a website 2014. I expect to be able to sort results in some ways, and I am given visual clues that the underlying data to perform the sorting is there.

Looking at what this actually could be used for as an editor is to find outliers or odd articles as I explaind before without having to rely on external tools like catscan.
Comment 14 Dan Garry 2014-03-20 18:06:28 UTC
Adding in a feature to sort by date or alphabetically by title will, for the reasons explained above, result in degraded performance for the vast majority of users. It's for this reason that search engines like Google don't allow you to sort by date or alphabetically by title; it degrades the quality of the service. I'm WONTFIXing this bug accordingly, as I cannot justify adding features to CirrusSearch that degrade the experience for the vast majority of its users.

That said, there may be a use case in here that does make sense, namely the ability to discard results from the search that have not been edited "recently" (for some definition of "recently"). It's possible to do that in such a way that the scoring algorithm isn't totally ignored, and therefore may not degrade user performance. See bug 62879.
Comment 15 Daniel Naber 2014-03-20 18:22:39 UTC
The reason Google doesn't offer sort by date is that they usually don't even know the date of last modification. The reason Google doesn't offer alphabetical sort is that they are indexing the web, not an encyclopedia. Anyway, I'm looking forward to a fix for bug 62879.
Comment 16 Jan Ainali 2014-03-20 19:16:33 UTC
(In reply to Dan Garry from comment #14)
> Adding in a feature to sort by date or alphabetically by title will, for the
> reasons explained above,

I am sorry, but you have not stated any reasons at all. In what way will this degrade performance for users? My wish is that the search is presented as it is today, with the option to sort it *after* the first results has been showed. So for users not wanting to sort it, there will be no difference at all. 

Here is another use case one user came up with during a discussion. You search for a term and you want to fix something in all these articles. By being able to sort it by last modified, all the ones you just fixed will go to the end of the list and it is to see what is left to do.
Comment 17 Dan Garry 2014-03-20 21:38:26 UTC
(In reply to Daniel Naber from comment #15)
> The reason Google doesn't offer sort by date is that they usually don't even
> know the date of last modification.

That's incorrect. I will unashamedly admit that I got the idea for bug 62789 from looking through Google's search settings to see how they address this problem. The definitions of "recent" that they allow are anytime, past 24 hours, past week, past month, and past year.

> The reason Google doesn't offer
> alphabetical sort is that they are indexing the web, not an encyclopedia.

That doesn't make a difference, honestly. Sorting a search alphabetically does not make sense for a search engine, whether it is a MediaWiki search engine or general search engine, as it simply makes the search engine provide less relevant results.
Comment 18 Jan Ainali 2014-03-20 21:45:38 UTC
(In reply to Dan Garry from comment #17) 
> That doesn't make a difference, honestly. Sorting a search alphabetically
> does not make sense for a search engine, whether it is a MediaWiki search
> engine or general search engine, as it simply makes the search engine
> provide less relevant results.

Your assumption here is that there is only one hit that might be interesting for the user. For many editors, lists of articles that are starting points for their editing behavior make perfect sense. Sure, editors are a very small number of users compared to readers. That is why sorting should be a secondary action, after the first results have been shown.
Comment 19 Daniel Naber 2014-03-20 22:13:14 UTC
(In reply to Dan Garry from comment #17)
> (In reply to Daniel Naber from comment #15)
> > The reason Google doesn't offer sort by date is that they usually don't even
> > know the date of last modification.
> 
> That's incorrect. I will unashamedly admit that I got the idea for bug 62789
> from looking through Google's search settings to see how they address this
> problem. The definitions of "recent" that they allow are anytime, past 24
> hours, past week, past month, and past year.

Only that it doesn't work properly because they need to do a lot of guesswork (which Wikipedia wouldn't, thanks to proper meta data). (Example for where it doesn't work properly: search for >site:de.wikipedia.org "languagetool"< on google.de and filter by 'last year' and see how the 'Apache OpenOffice' result disappears from the list although it was modified recently).

BTW, "sort by date" and "sort by relevance" links *do* appear on Google once you have limited results by date. Anyway, I don't think we should care what Google does, the use case of web search if too different from a site-wide search.
Comment 20 Dan Garry 2014-03-20 22:48:52 UTC
(In reply to Jan Ainali from comment #16)
> Here is another use case one user came up with during a discussion. You
> search for a term and you want to fix something in all these articles. By
> being able to sort it by last modified, all the ones you just fixed will go
> to the end of the list and it is to see what is left to do.

Implementing search sorting for this use case is implementing a solution to a problem that does not exist. In CirrusSearch, the search index is updated within seconds of changes to articles, so any articles you've fixed will be removed from the search results if you were to rerun the query.

(In reply to Jan Ainali from comment #18)
> Your assumption here is that there is only one hit that might be interesting
> for the user. For many editors, lists of articles that are starting points
> for their editing behavior make perfect sense. Sure, editors are a very
> small number of users compared to readers. That is why sorting should be a
> secondary action, after the first results have been shown.

I make the assumption that being the user expects to be presented with results relevant to the query that is typed in. If you are not expecting that, then you should not be trying to change the intended function of the search engine, you should be using some other tool (like database dumps). I'm genuinely sorry if that's harder and more inconvenient for you, because I do not like making things inconvenient for people, but that does not change my stance about the product that is search.

(In reply to Jan Ainali from comment #16)
> I am sorry, but you have not stated any reasons at all. In what way will
> this degrade performance for users? My wish is that the search is presented
> as it is today, with the option to sort it *after* the first results has
> been showed. So for users not wanting to sort it, there will be no
> difference at all.

I've already outlined that sorting by date will, for the vast majority of users, generate meaningless results. Putting it behind a button and expecting people not to press that button does not make that okay.

Just to be clear, you reopening the bug does not change our work priorities, or change that any patch which attempts to implement this functionality will be met with a -2 by the engineers that work on the extension. It just leaves the bug in a status that does represent our ongoing priorities, which is confusing and nothing more.

I will close the bug once more to rectify that confusion, but after that any I'll just leave it because I don't want to spend my time in a bug status revert war. Any ramifications for the incorrect status of the bug will be your responsibility.
Comment 21 Andre Klapper 2014-03-21 00:09:26 UTC
Thanks a lot for elaborating here, Dan! 

As per comment 20, "In CirrusSearch, the search index is updated within seconds of changes to article" is a convincing reason why nobody should actively work on fixing this ("WONTFIX"). Still, anybody is free to hack the MediaWiki code to implement such a feature for themselves on their own wiki, if wanted.

In general, resolutions and priorities in bug reports are supposed to reflect reality but do not cause it; also see [[mw:Bug management/Bugzilla etiquette]].
Thanks everybody for your understanding and I'm sorry that Wikimedia developers will not fulfil the request in this ticket.
Comment 22 Chad H. 2014-03-21 02:04:47 UTC
(In reply to Andre Klapper from comment #21)
> Thanks a lot for elaborating here, Dan! 
> 
> As per comment 20, "In CirrusSearch, the search index is updated within
> seconds of changes to article" is a convincing reason why nobody should
> actively work on fixing this ("WONTFIX"). Still, anybody is free to hack the
> MediaWiki code to implement such a feature for themselves on their own wiki,
> if wanted.
> 

I wouldn't recommend anyone trying to write code for this either. I won't merge it  to master ;-)
Comment 23 Jan Ainali 2014-03-21 10:55:18 UTC
(In reply to Dan Garry from comment #20)
> (In reply to Jan Ainali from comment #16)
> > Here is another use case one user came up with during a discussion. You
> > search for a term and you want to fix something in all these articles. By
> > being able to sort it by last modified, all the ones you just fixed will go
> > to the end of the list and it is to see what is left to do.
> 
> Implementing search sorting for this use case is implementing a solution to
> a problem that does not exist. In CirrusSearch, the search index is updated
> within seconds of changes to articles, so any articles you've fixed will be
> removed from the search results if you were to rerun the query.

No, you are assuming I am removing what I search for. I could very well search for one string, because I know articles with this string has something ese that need to be fixed. So they will still show up, but with a sorting function, I could easily move them to the end of the results.

> 
> (In reply to Jan Ainali from comment #18)
> > Your assumption here is that there is only one hit that might be interesting
> > for the user. For many editors, lists of articles that are starting points
> > for their editing behavior make perfect sense. Sure, editors are a very
> > small number of users compared to readers. That is why sorting should be a
> > secondary action, after the first results have been shown.
> 
> I make the assumption that being the user expects to be presented with
> results relevant to the query that is typed in. If you are not expecting
> that, then you should not be trying to change the intended function of the
> search engine, you should be using some other tool (like database dumps).
> I'm genuinely sorry if that's harder and more inconvenient for you, because
> I do not like making things inconvenient for people, but that does not
> change my stance about the product that is search.

Hmm, perhaps this should not be in product search? Perhaps this should be a new special page, Special:SortArticles where you in different ways, perhaps even through categories or templates or search can sort articles. I would also see this sorting useful in other places, such as Special:WhatLinksHere  (whose sort order by the way is not clearly conveyed on that special page). Having to go to catscan for that is, as you say, inconvenient for people, but even worse, might hinder a new interested user from becoming a 100+edits/month user because they do not know catscan exists and find it to tedious digging for the articles they want to improve. 

> 
> (In reply to Jan Ainali from comment #16)
> > I am sorry, but you have not stated any reasons at all. In what way will
> > this degrade performance for users? My wish is that the search is presented
> > as it is today, with the option to sort it *after* the first results has
> > been showed. So for users not wanting to sort it, there will be no
> > difference at all.
> 
> I've already outlined that sorting by date will, for the vast majority of
> users, generate meaningless results. Putting it behind a button and
> expecting people not to press that button does not make that okay.
> 
> Just to be clear, you reopening the bug does not change our work priorities,
> or change that any patch which attempts to implement this functionality will
> be met with a -2 by the engineers that work on the extension. It just leaves
> the bug in a status that does represent our ongoing priorities, which is
> confusing and nothing more.
> 
> I will close the bug once more to rectify that confusion, but after that any
> I'll just leave it because I don't want to spend my time in a bug status
> revert war. Any ramifications for the incorrect status of the bug will be
> your responsibility.

I apologize for my reopening before, I was not aware of that meaning of the status.

Regarding the placement, I would suggest the sorting is hidden under the advanced menu, where power users and the curious ones will find it when they need it without it disturbing the casual reader (who probably will be scared away by all the namespace boxes if they dare open the advanced menu :) ). But then again, if it is a separate special page, it might even be prominently displayed, because the user going there is only expecting to be able to sort.

Would it be better if I created a new enhancement ticket and link to this one from the description or should this be updated to reflect the change from search to something else?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links