Last modified: 2011-04-30 01:16:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T19733, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 17733 - Search results incorrect for <4-letter words after update from <1.13
Search results incorrect for <4-letter words after update from <1.13
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.14.x
All All
: Low minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-01 13:13 UTC by Subfader
Modified: 2011-04-30 01:16 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Subfader 2009-03-01 13:13:02 UTC
On my MW 1.14 (without Lucene) the search now results only crap when you include words with 4 or less letters. in 1.1q3 all was fine. Exampes: 

Searching "the word" will return only talk pages when "word is found". 

Searching "the" will find "the" (since when are 3 letter words indexed?) and result a few articles but mostly talk pages again.

Maybe there is more critical stuff I just didn't find yet.
Comment 1 Subfader 2009-03-01 13:16:34 UTC
I simply guess the new search wasn't tested without Lucene.
Comment 2 Andrew Garrett 2009-03-01 13:20:32 UTC
The second error at least is expected behaviour.

We want to index three-letter words (see bug 7726).

I'm not sure about the first. Is it that quotes aren't working as expected? Can you link to a test case? I'm not sure how to interpret the test case you provided.
Comment 3 Subfader 2009-03-01 13:22:43 UTC
Nothing to do with quotes. I just quoted my search keywords :)

http://www.mixesdb.com/db/index.php/Special:Search?search=club+fg&fulltext=Search

should find stuff from http://www.mixesdb.com/db/index.php/Category:Club_FG
Comment 4 Subfader 2009-03-01 13:24:11 UTC
well at least club should be found and not just Talk pages.

More: http://www.mixesdb.com/db/index.php/Special:Search?search=Centro+Fly&fulltext=Search doesn't find http://www.mixesdb.com/db/index.php/Category:Centro_Fly
Comment 5 Subfader 2009-03-01 13:30:41 UTC
Must be something about category and article names. Searching other ns page titles works. Maybe a wrong setting? Buggy anyway then imo. WIth every new version MW seems to care care less about backward compatibility.
Comment 6 Subfader 2009-03-01 13:33:18 UTC
About the 3 letters indexed: This was not mentioned in the release notes of 1.14 http://svn.wikimedia.org/svnroot/mediawiki/tags/REL1_14_0/phase3/RELEASE-NOTES ? Some wikis use a note that you can't search 3 or less letter words...
Comment 7 Subfader 2009-03-02 14:56:23 UTC
ok it has some odd behaviour. ran updateSearchIndex.php yesterday. today it finds the examples from above :)
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-03-02 14:59:42 UTC
(In reply to comment #6)
> About the 3 letters indexed: This was not mentioned in the release notes of
> 1.14
> http://svn.wikimedia.org/svnroot/mediawiki/tags/REL1_14_0/phase3/RELEASE-NOTES
> ?

* (bug 7726) Searches for words less than 4 characters now work without
  requiring customization of MySQL server settings
Comment 9 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-03-02 15:14:13 UTC
If rebuilding the search index is necessary for search to work correctly, this should be done by update.php.  Reopening.
Comment 10 Subfader 2009-03-02 15:31:06 UTC
Well, it was like this: When I searched "club fg" I only got Talk page results. I ran the rebuild script and it indexeed 95% talk pages. I defined start and end to cover all times of my wiki and it indexed the whole bunch. Nothing seemed to be changed in the results till I checked again today. 

Now "club fg" find the category but not the pages.

Still http://www.mixesdb.com/db/index.php/Special:Search?search=Centro+Fly&fulltext=Search doesn't find http://www.mixesdb.com/db/index.php/Category:Centro_Fly while trying "Centro" finds it (seems that the "fly" is breaking the search).

I use Extension:GoToCategory so don't bother trying my search using "Go".

The ultimate test is "in the mix" http://www.mixesdb.com/db/index.php/Special:Search?search=in+the+mix&fulltext=Search should find tons of pages from http://www.mixesdb.com/db/index.php/Category:Centro_Fly

Would be good to have other 1.14 sites without Lucene to check if it's only me.
Comment 11 Subfader 2009-03-02 15:33:31 UTC
**should find tons of pages from http://www.mixesdb.com/db/index.php/Category:In_The_Mix
Comment 12 Subfader 2009-03-02 15:56:35 UTC
I think the problem is 99,9% on my side. Please keep it closed till I finished checking :)
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-03-02 15:57:33 UTC
If you have to manually run the search index rebuild script, that's not a problem on your side, that's a problem with the stated upgrade procedure.
Comment 14 Subfader 2009-03-02 16:07:02 UTC
I had some restrictions errors when I tried to run the rebuild script in my new 1.14 directory.
Smart as I am I copied the 2 rebuild script files from the 1.14 to the 1.13 maintenance directory and ran it from there. In my logic the indexing procedure to the table would be correct? Guess not:
After running the 1.14 script in my 1.13 directory the index is updated and "club fg" not found. When I change the Club FG article page it is found. This tells me the way i ran the rebuild script was not correct. Need to get rid of the restriction errors first to properly run it in my 1.14 directory. THEN I can tell if it works ;)
Comment 15 Subfader 2009-03-02 16:08:05 UTC
**When I change the Club FG category page
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-03-02 16:24:02 UTC
You should not have to be doing this.  It's a bug in the 1.14 release that should be fixed in the next point release if possible.  This should all happen automatically when you run update.php.  Reopening.
Comment 17 Subfader 2009-03-19 20:54:54 UTC
For the record: I was not able to run maintenance script cos of the changes to the command line scripts now using realpath() which requires safe_mode to be turned off for command line.

The search seems to work now after I ran maintenance/rebuildtextindex.php instead of only updateSearchIndex.php.
Comment 18 Brion Vibber 2009-07-20 03:26:57 UTC
Given link seems to work fine at present. Rebuild would only be needed for newly indexed things; without reindexing behavior would be exactly as before (ignored words would remain ignored).
Comment 19 Brion Vibber 2010-11-15 21:39:51 UTC
Reopening -- further consideration after a post on mailing list reminds me that in fact some words will behave differently.

If padding keeps the word off the ignore list, then it becomes a required word -- which won't be found in old pages that had the word but not the padded word. Upgrade procedure possibly should be updated, or at least a reference to rebuildtextindex slipped into UPGRADE.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links