Last modified: 2014-08-13 20:46:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 66969 - intitle search doesn't work
intitle search doesn't work
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
CirrusSearch (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nik Everett
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-23 07:09 UTC by bennylin
Modified: 2014-08-13 20:46 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description bennylin 2014-06-23 07:09:57 UTC
I tried to search articles with "intitle:dari Spanyol" (from Spain) in the title, but it gave 0 result, the same if I search "intitle:dari" (from), but it gave the expected result when I searched "intitle:Spanyol" (Spain).

1. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Adari+spanyol&fulltext=Search&uselang=en
2. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Adari&fulltext=Search&uselang=en
3. https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3Aspanyol&fulltext=Search&uselang=en

Expecting some kind of error message other than "There were no results matching the query."
Comment 1 Nik Everett 2014-06-23 14:24:50 UTC
Something is a certainly weird here.  Temporary work around:
https://id.wikipedia.org/w/index.php?title=Istimewa%3APencarian&profile=default&search=intitle%3A%22dari+spanyol%22&fulltext=Search
Comment 2 bennylin 2014-06-27 08:28:44 UTC
I suspect it is some kind of language-based stop words, in this case Indonesian language, because of three reasons:

1. other Indonesian stop words also didn't show up ("intitle:di" - in, "intitle:ke" - to)
2. those words ("intitle:di", "intitle:ke", "intitle:dari") are found in other projects
3. based on my experience, id.wp's CirrusSearch employ some kind of Indonesian-language stemmer

If that is true, is it possible to disable the stop words?
Comment 3 bennylin 2014-06-27 08:33:01 UTC
Further investigation:

Searching "intitle:di" in Italian Wikipedia also failed https://it.wikipedia.org/w/index.php?title=Speciale%3ARicerca&profile=advanced&search=intitle%3Adi&fulltext=Search&ns0=1&profile=advanced

But searching "intitle:from" in English Wikipedia and "intitle:von" in German Wikipedia yields the expected results.

(btw, my searching context was noble titles, e.g. "ABC from XYZ" which translates "ABC dari XYZ" in id.wp and "ABC di XYZ" in it.wp, and so on)
Comment 5 bennylin 2014-08-13 20:12:54 UTC
Probably related 
* [[bugzilla:54875]]  Automatic stopwords for the 200+ languages without their own analyzer available 
* [[bugzilla:60362]]  CirrusSearch: Stopwords are not optional and are worth as much as exact matches 
* https://www.mail-archive.com/mediawiki-commits@lists.wikimedia.org/msg169298.html

So, where can I look at the Indonesian stopwords list, and/or stemmer?
Comment 6 Nik Everett 2014-08-13 20:46:34 UTC
Looks like this is the stemmer:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/id/IndonesianStemmer.java
These are the stopwords:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/resources/org/apache/lucene/analysis/id/stopwords.txt

Those bugs are related.  The reason we haven't fixed them is because its a pretty large effort and we're still concentrating on performance.  Its on the list, but it isn't as high as I'd like it to be.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links