Last modified: 2013-02-24 16:26:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32595, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30595 - Inconsistent search results with language links
Inconsistent search results with language links
Status: RESOLVED WORKSFORME
Product: MediaWiki extensions
Classification: Unclassified
MWSearch (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-27 13:10 UTC by Lejonel
Modified: 2013-02-24 16:26 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Lejonel 2011-08-27 13:10:45 UTC
When searching for "fiets" (dutch for bicycle) in English Wikipedia, [[Bicycle]] is second in the list of results. But the displayed text is not always the same. Most times it shows the beginning of the article (which does not include that word or any similar words):

 Bicycle
 Bicycle (disambiguation) File:Marin bike. jpg | A mountain bike , a popular multi-use bicycle. A bicycle, also known as a bike, pushbike or ...
 55 KB (7,627 words) - 02:20, 26 August 2011

But sometimes it displays the language links which has the words "fiets":

 Bicycle
 [[af:Fiets]] [[nl:Fiets]]
 55 KB (8,105 words) - 02:20, 26 August 2011

The sample text and the word count are different.

Searching for translations of bicycle in other languages ("cykel", "fietse", "sykkel") does not usually find [[Bicycle]]. Unless that word is also used in other places than the language links, like German "Fahrrad" in the name of an image. So it looks like language links should not not usually be included in the search.
Comment 1 Foroa 2012-11-29 15:27:24 UTC
Interlanguage links seems to be found in article space, not in category space. On Commons, the interlanguage links in categories are essential. 
On en:wiki searching fiets returns article:bicyle but not category:Bicycles.

Foroa
Comment 2 Nemo 2013-02-21 18:04:58 UTC
I suspect should rather be moved to Wikimedia>Lucene component?
Also of relevance, what will happen when inter(language)wikilinks are moved to Wikidata (in a few days)?
Comment 3 Lejonel 2013-02-23 10:30:52 UTC
(In reply to comment #2)
I assumed the search indexing used the wikicode of the article. So when language links were moved from wikicode to Wikidata they would not be found by searching. But a search for "fiets" still finds [[Bicycle]]. This is still inconsistent with not finding [[Bicycle]] when searching for other language links ("cykel", "fietse", "sykkel").

Is it possible that the search database (lucene?) contains incorrect data that somehow connects "fiets" with article [[Bicycle]].
Comment 4 Lejonel 2013-02-23 10:50:07 UTC
(In reply to comment #1)
That is not what I have seen. Searching for interlanguage links usually finds neither articles nor pages in other namespaces (unless the word is also used in some other way on that page). Dutch "fiets" finding article [[Bicycle]] seems to be an exception.
Comment 5 Nemo 2013-02-23 10:50:52 UTC
(In reply to comment #3)
> (In reply to comment #2)
> I assumed the search indexing used the wikicode of the article. So when
> language links were moved from wikicode to Wikidata they would not be found
> by
> searching. 

So this bug is no longer a problem, strictly speaking: results are no longer inconsistent because you can never get the interwiki as search snippet.

> But a search for "fiets" still finds [[Bicycle]]. This is still
> inconsistent with not finding [[Bicycle]] when searching for other language
> links ("cykel", "fietse", "sykkel").
> 
> Is it possible that the search database (lucene?) contains incorrect data
> that
> somehow connects "fiets" with article [[Bicycle]].

From this particular article they were removed 3 days ago, so the index should be up to date; however, it's possible that "fiets" is the label of some link to the article: there's no reason to believe it's a mistake, on the contrary it's consistent with your previous observations about other languages.
If you find actual errors in search results, please file a separate bug.
Comment 6 Mark A. Hershberger 2013-02-23 13:06:36 UTC
(In reply to comment #5)
> however, it's possible that "fiets" is the label of some link
> to the article: there's no reason to believe it's a mistake, on the contrary
> it's consistent with your previous observations about other languages.

The langlinks were the only instance of "fiets" in the the raw text of the article.  They were removed at  06:39, 20 February 2013‎.  It is just over 3 days since so the index may be slower to catch up.
Comment 7 Foroa 2013-02-24 16:26:53 UTC
My main complaints where that the interlanguagelinks in the categories are not found by the search engine. It still finds the associated article bicycle (where fiets only appeared in IL), God knows why, but not the associated category. I've seen that documented/discussed somewhere, but I can't find it back. 

I've seen that all interlanguage links have been removed. As I've documented on Commons many thousands of categories with such links, I would like to know how they are setup now and what the search engine does with it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links