Last modified: 2013-02-24 16:26:53 UTC
When searching for "fiets" (dutch for bicycle) in English Wikipedia, [[Bicycle]] is second in the list of results. But the displayed text is not always the same. Most times it shows the beginning of the article (which does not include that word or any similar words): Bicycle Bicycle (disambiguation) File:Marin bike. jpg | A mountain bike , a popular multi-use bicycle. A bicycle, also known as a bike, pushbike or ... 55 KB (7,627 words) - 02:20, 26 August 2011 But sometimes it displays the language links which has the words "fiets": Bicycle [[af:Fiets]] [[nl:Fiets]] 55 KB (8,105 words) - 02:20, 26 August 2011 The sample text and the word count are different. Searching for translations of bicycle in other languages ("cykel", "fietse", "sykkel") does not usually find [[Bicycle]]. Unless that word is also used in other places than the language links, like German "Fahrrad" in the name of an image. So it looks like language links should not not usually be included in the search.
Interlanguage links seems to be found in article space, not in category space. On Commons, the interlanguage links in categories are essential. On en:wiki searching fiets returns article:bicyle but not category:Bicycles. Foroa
I suspect should rather be moved to Wikimedia>Lucene component? Also of relevance, what will happen when inter(language)wikilinks are moved to Wikidata (in a few days)?
(In reply to comment #2) I assumed the search indexing used the wikicode of the article. So when language links were moved from wikicode to Wikidata they would not be found by searching. But a search for "fiets" still finds [[Bicycle]]. This is still inconsistent with not finding [[Bicycle]] when searching for other language links ("cykel", "fietse", "sykkel"). Is it possible that the search database (lucene?) contains incorrect data that somehow connects "fiets" with article [[Bicycle]].
(In reply to comment #1) That is not what I have seen. Searching for interlanguage links usually finds neither articles nor pages in other namespaces (unless the word is also used in some other way on that page). Dutch "fiets" finding article [[Bicycle]] seems to be an exception.
(In reply to comment #3) > (In reply to comment #2) > I assumed the search indexing used the wikicode of the article. So when > language links were moved from wikicode to Wikidata they would not be found > by > searching. So this bug is no longer a problem, strictly speaking: results are no longer inconsistent because you can never get the interwiki as search snippet. > But a search for "fiets" still finds [[Bicycle]]. This is still > inconsistent with not finding [[Bicycle]] when searching for other language > links ("cykel", "fietse", "sykkel"). > > Is it possible that the search database (lucene?) contains incorrect data > that > somehow connects "fiets" with article [[Bicycle]]. From this particular article they were removed 3 days ago, so the index should be up to date; however, it's possible that "fiets" is the label of some link to the article: there's no reason to believe it's a mistake, on the contrary it's consistent with your previous observations about other languages. If you find actual errors in search results, please file a separate bug.
(In reply to comment #5) > however, it's possible that "fiets" is the label of some link > to the article: there's no reason to believe it's a mistake, on the contrary > it's consistent with your previous observations about other languages. The langlinks were the only instance of "fiets" in the the raw text of the article. They were removed at 06:39, 20 February 2013. It is just over 3 days since so the index may be slower to catch up.
My main complaints where that the interlanguagelinks in the categories are not found by the search engine. It still finds the associated article bicycle (where fiets only appeared in IL), God knows why, but not the associated category. I've seen that documented/discussed somewhere, but I can't find it back. I've seen that all interlanguage links have been removed. As I've documented on Commons many thousands of categories with such links, I would like to know how they are setup now and what the search engine does with it.