Last modified: 2014-08-16 10:36:56 UTC
If an article is moved the search will no longer find the article when searching for the content. Only the title will be matched. This seems to be a Postgres related bug, because we can reproduce this only in Postgres based installations.
This bug is caused by the update function in includes/search/SearchPostgres.php. function update( $pageid, $title, $text ) { ## We don't want to index older revisions $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " . "(SELECT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) . " ORDER BY rev_text_id DESC OFFSET 1)"; $this->db->query( $sql ); return true; } The subselect statement is missing a DISTINCT and must look like: function update( $pageid, $title, $text ) { ## We don't want to index older revisions $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " . "(SELECT DISTINCT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) . " ORDER BY rev_text_id DESC OFFSET 1)"; $this->db->query( $sql ); return true; } This DISTINCT is necessary, because with every move a new entry in the revision table is generated, with the same rev_text_id.
I have verified the problem and the proposed solution. The harder part is what to do about already corrupted data. I can fix it by running this: update pagecontent set textvector=to_tsvector(old_text) where textvector is null and old_id in (SELECT max(rev_text_id) FROM revision group by rev_page); I don't like that solution, because it should invoke the ts2_page_text trigger to set textvector, rather than doing it manually. That way if someone customized the trigger, it still does the right thing. Usually I'd do a dummy update and let the trigger take care of it. But ts2_page_text detects dummy updates and doesn't populate textvector in that case. There are other solutions (create a temp table, or use a CTE to delete and reinsert the tuples) but they seem like more trouble than it is worth. Does the fix adopted also have to work in PostgreSQL 8.3? That version of PostgreSQL is past its EOL, but MediaWiki does still list it.
Change 153565 had a related patch set uploaded by Jjanes: PostgreSQL: Fix text search on moved pages https://gerrit.wikimedia.org/r/153565