Last modified: 2013-03-26 11:24:09 UTC
When pages get edited multiple times during indexing, duplicate entries end up getting inserted. Lucene's inability to read and write on the same index makes this unnecessarily difficult to do right. Placing updates on our own per-database queues, replacing duplicates during that time, and then applying updates direct instead of through an in-memory directory might work reasonably well.
As a temporary workaround, I've hacked the daemon to skip over duplicate results. (They need to be adjacent in the results to actually get skipped over.)
(In reply to comment #0) > When pages get edited multiple times during indexing, duplicate entries end up getting inserted. Does this still happen? And does it happen only for -rebuild, since the article is deleted before it's added again when doing an increment? Thanks.
This is all obsolete now.
[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]