Last modified: 2009-11-04 15:20:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20609, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18609 - Search index text is empty if page contains unmatched "<"
Search index text is empty if page contains unmatched "<"
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
1.16.x
All All
: Normal normal (vote)
: ---
Assigned To: Max Semenik
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-27 17:42 UTC by nephele
Modified: 2009-11-04 15:20 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Fix broken regexp in SearchUpdate.php (patch to r49794) (638 bytes, patch)
2009-04-27 17:42 UTC, nephele
Details

Description nephele 2009-04-27 17:42:51 UTC
Created attachment 6067 [details]
Fix broken regexp in SearchUpdate.php (patch to r49794)

If an article contains a "<" symbol and there is no subsequent ">" symbol anywhere in the article, the si_text field for that article in the searchindex table ends up completely empty -- even the text in the article before the "<" symbol is wiped out.  It is therefore impossible to search on any of the article's contents.

For example, http://www.uesp.net/wiki/UESPWiki:Mirror_Plan is currently triggering this bug; si_text is being set to ''. Although UESP is currently running MW1.10, the same bug occurs if the article is added to a test wiki running r49794.

The basic problem is an incorrect pair of parentheses in a preg_replace expression in SearchUpdate.php::doUpdate().  The attached patch file removes those parentheses; I also did some secondary cleanup of the expression by deleting some redundant chunks ("[A-Za-z0-9]*\\s*" is all covered equally well by "[^>]*?", and the simpler expression doesn't mislead editors).  The revised regexp successfully processes UESPWiki:Mirror_Plan, and also successfully processes some test pages containing html tags.
Comment 1 Max Semenik 2009-11-04 15:20:06 UTC
Thanks, patch applied in r58548, along with some tests.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links