Last modified: 2014-08-16 10:36:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68650, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66650 - After moving an page, search will no longer finds the article on Postgres
After moving an page, search will no longer finds the article on Postgres
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.23.0
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: postgres
  Show dependency treegraph
 
Reported: 2014-06-16 08:28 UTC by gebhkla
Modified: 2014-08-16 10:36 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description gebhkla 2014-06-16 08:28:54 UTC
If an article is moved the search will no longer find the article when searching for the content. Only the title will be matched.

This seems to be a Postgres related bug, because we can reproduce this only in Postgres based installations.
Comment 1 gebhkla 2014-06-16 14:54:55 UTC
This bug is caused by the update function in includes/search/SearchPostgres.php.

        function update( $pageid, $title, $text ) {
                ## We don't want to index older revisions
                $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " .
                                "(SELECT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) .
                                " ORDER BY rev_text_id DESC OFFSET 1)";

                $this->db->query( $sql );
                return true;
        }

The subselect statement is missing a DISTINCT and must look like:

        function update( $pageid, $title, $text ) {
                ## We don't want to index older revisions
                $sql = "UPDATE pagecontent SET textvector = NULL WHERE old_id IN " .
                                "(SELECT DISTINCT rev_text_id FROM revision WHERE rev_page = " . intval( $pageid ) .
                                " ORDER BY rev_text_id DESC OFFSET 1)";

                $this->db->query( $sql );
                return true;
        }

This DISTINCT is necessary, because with every move a new entry in the revision table is generated, with the same rev_text_id.
Comment 2 Jeff Janes 2014-06-16 19:58:57 UTC
I have verified the problem and the proposed solution.

The harder part is what to do about already corrupted data.

I can fix it by running this:

update pagecontent set textvector=to_tsvector(old_text) where 
 textvector is null and old_id in 
 (SELECT  max(rev_text_id) FROM revision group by rev_page);

I don't like that solution, because it should invoke the ts2_page_text trigger to set textvector, rather than doing it manually.  That way if someone customized the trigger, it still does the right thing.  Usually I'd do a dummy update and let the trigger take care of it. But ts2_page_text detects dummy updates and doesn't populate textvector in that case.  There are other solutions (create a temp table, or use a CTE to delete and reinsert the tuples) but they seem like more trouble than it is worth.

Does the fix adopted also have to work in PostgreSQL 8.3?  That version of PostgreSQL is past its EOL, but MediaWiki does still list it.
Comment 3 Gerrit Notification Bot 2014-08-12 07:07:07 UTC
Change 153565 had a related patch set uploaded by Jjanes:
PostgreSQL: Fix text search on moved pages

https://gerrit.wikimedia.org/r/153565

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links