Last modified: 2014-01-29 16:18:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61666, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59666 - CirrusSearch shouldn't provide alternative spelling when an exact result is found
CirrusSearch shouldn't provide alternative spelling when an exact result is f...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
master
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-05 07:22 UTC by Quim Gil
Modified: 2014-01-29 16:18 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Quim Gil 2014-01-05 07:22:21 UTC
1. Go to http://ca.wikipedia.org

2. Search "José Mourinho"

Or click here:

https://ca.wikipedia.org/w/index.php?title=Especial%3ACerca&profile=advanced&search=Jos%C3%A9+Mourinho&fulltext=Search&ns0=1&ns4=1&ns10=1&ns12=1&redirs=1&profile=advanced

EXPECTED

If such page exists just show it.

ACTUALLY 

Even if the exact page exists and it is listed first in the results, the first message displayed is "Did you mean: jose mourinho"

Well no, I really meant José Mourinho.  :)

This is the first time I pay enough attention to detect this problem consciously (and report it) but I have seen more cases like this. Maybe the pattern is the accent in the search term? I will keep watching.
Comment 1 Nik Everett 2014-01-05 18:03:27 UTC
Elasticsearch won't make a suggestion unless the suggested text appears to be about 2x as likely as the provided text (our configuration) so I'm guessing this is caused by us getting suggestions from redirect as well as titles.  I'll have a look at it soon.

Another thing:  I believe the fix for this will be to not provide a suggestion when the entire title is matched.  I think it'd be more appropriate for me to implement this in CirrusSearch even though I'm sure that LuceneSearch has the same problem.  The reason for this is that if I implement the fix in Cirrus then, one day, when I find a really good excuse to violate the rule, I'll be able to without having to make more convoluted changes to core.  I know, YAGNI, but my gut says do it in Cirrus and I'm going to trust it.
Comment 2 Chad H. 2014-01-05 18:05:37 UTC
(In reply to comment #1)
> Elasticsearch won't make a suggestion unless the suggested text appears to be
> about 2x as likely as the provided text (our configuration) so I'm guessing
> this is caused by us getting suggestions from redirect as well as titles. 
> I'll
> have a look at it soon.
> 
> Another thing:  I believe the fix for this will be to not provide a
> suggestion
> when the entire title is matched.  I think it'd be more appropriate for me to
> implement this in CirrusSearch even though I'm sure that LuceneSearch has the
> same problem.  The reason for this is that if I implement the fix in Cirrus
> then, one day, when I find a really good excuse to violate the rule, I'll be
> able to without having to make more convoluted changes to core.  I know,
> YAGNI,
> but my gut says do it in Cirrus and I'm going to trust it.

I was about to say the exact same thing, except let's fix it in core for all search engines.

It makes no sense to have "Did you mean Foo?" "There is a page called 'Foo'" like 3 lines apart on the same page :)
Comment 3 Nik Everett 2014-01-05 18:13:29 UTC
I was really thinking about it in core too, but that little imp in my said we'd want to break that rule one day.

I dunno.  Also, I'd like to look into how that "There is a page called 'Foo'" thing comes up.  Does it use the near match hook?  If so, it'll work properly on wikis with Cirrus as primary come Monday because we're turning off TitleKey for them.

I imagine there are cases where we'll return a fully highlighted title but not have a page that matches the results.  Oooh, and check this out: if that fully highlighted title is on the first page of the search results, we'd start showing the did you mean on the second page!

Needs more investigation!
Comment 4 Nik Everett 2014-01-06 16:38:42 UTC
OK!  I had a look at it.


1.  "There is a page called 'Foo'" comes from Title::newFromText( $term )->isKnown().  We certainly shouldn't provide a suggestion in that case.
2.  It is still possible for CirrusSearch or LuceneSearch to provide a great match even though the text isn't known.  Try searching for "pickett charge" or even "main pages".  The top result is obviously good enough for it not to be worth showing a suggestion but cirrus does it any way.

I think #1 we should fix in core.
#2 we should probably do in Cirrus because it knows more about how it highlights.

I'll do both.
Comment 5 Gerrit Notification Bot 2014-01-06 16:58:28 UTC
Change 105705 had a related patch set uploaded by Manybubbles:
Don't suggest if the search tem is a known title

https://gerrit.wikimedia.org/r/105705
Comment 6 Nik Everett 2014-01-06 16:59:29 UTC
That patch was #1 in core.  #2 in cirrus coming later.
Comment 7 Gerrit Notification Bot 2014-01-08 18:19:36 UTC
Change 105705 merged by jenkins-bot:
Don't suggest if the search term is a known title

https://gerrit.wikimedia.org/r/105705
Comment 8 Gerrit Notification Bot 2014-01-09 15:55:57 UTC
Change 106523 had a related patch set uploaded by Manybubbles:
Don't suggest anything if a result is a full match

https://gerrit.wikimedia.org/r/106523
Comment 9 Gerrit Notification Bot 2014-01-09 16:00:51 UTC
Change 106523 merged by jenkins-bot:
Don't suggest anything if a result is a full match

https://gerrit.wikimedia.org/r/106523
Comment 10 Gerrit Notification Bot 2014-01-15 21:01:47 UTC
Change 107663 had a related patch set uploaded by Chad:
Don't suggest if the search term is a known title

https://gerrit.wikimedia.org/r/107663
Comment 11 Gerrit Notification Bot 2014-01-15 21:02:33 UTC
Change 107663 abandoned by Chad:
Don't suggest if the search term is a known title

Reason:
Nevermind, just wait til tomorrow :)

https://gerrit.wikimedia.org/r/107663

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links