Last modified: 2014-08-13 20:46:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68969, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66969 - intitle search doesn't work
intitle search doesn't work
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-23 07:09 UTC by bennylin
Modified: 2014-08-13 20:46 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description bennylin 2014-06-23 07:09:57 UTC
I tried to search articles with "intitle:dari Spanyol" (from Spain) in the title, but it gave 0 result, the same if I search "intitle:dari" (from), but it gave the expected result when I searched "intitle:Spanyol" (Spain).

1. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Adari+spanyol&fulltext=Search&uselang=en
2. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Adari&fulltext=Search&uselang=en
3. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Aspanyol&fulltext=Search&uselang=en

Expecting some kind of error message other than "There were no results matching the query."
Comment 1 Nik Everett 2014-06-23 14:24:50 UTC
Something is a certainly weird here.  Temporary work around:
https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3A%22dari+spanyol%22&fulltext=Search
Comment 2 bennylin 2014-06-27 08:28:44 UTC
I suspect it is some kind of language-based stop words, in this case Indonesian language, because of three reasons:

1. other Indonesian stop words also didn't show up ("intitle:di" - in, "intitle:ke" - to)
2. those words ("intitle:di", "intitle:ke", "intitle:dari") are found in other projects
3. based on my experience, id.wp's CirrusSearch employ some kind of Indonesian-language stemmer

If that is true, is it possible to disable the stop words?
Comment 3 bennylin 2014-06-27 08:33:01 UTC
Further investigation:

Searching "intitle:di" in Italian Wikipedia also failed https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=advanced&search=intitle%3Adi&fulltext=Search&ns0=1&profile=advanced

But searching "intitle:from" in English Wikipedia and "intitle:von" in German Wikipedia yields the expected results.

(btw, my searching context was noble titles, e.g. "ABC from XYZ" which translates "ABC dari XYZ" in id.wp and "ABC di XYZ" in it.wp, and so on)
Comment 5 bennylin 2014-08-13 20:12:54 UTC
Probably related 
* [[bugzilla:54875]]  Automatic stopwords for the 200+ languages without their own analyzer available 
* [[bugzilla:60362]]  CirrusSearch: Stopwords are not optional and are worth as much as exact matches 
* https://www.mail-archive.com/mediawiki-commits@lists.wikimedia.org/msg169298.html

So, where can I look at the Indonesian stopwords list, and/or stemmer?
Comment 6 Nik Everett 2014-08-13 20:46:34 UTC
Looks like this is the stemmer:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/id/IndonesianStemmer.java
These are the stopwords:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/id/stopwords.txt

Those bugs are related.  The reason we haven't fixed them is because its a pretty large effort and we're still concentrating on performance.  Its on the list, but it isn't as high as I'd like it to be.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links