Last modified: 2009-11-04 15:20:06 UTC
Created attachment 6067 [details] Fix broken regexp in SearchUpdate.php (patch to r49794) If an article contains a "<" symbol and there is no subsequent ">" symbol anywhere in the article, the si_text field for that article in the searchindex table ends up completely empty -- even the text in the article before the "<" symbol is wiped out. It is therefore impossible to search on any of the article's contents. For example, http://www.uesp.net/wiki/UESPWiki:Mirror_Plan is currently triggering this bug; si_text is being set to ''. Although UESP is currently running MW1.10, the same bug occurs if the article is added to a test wiki running r49794. The basic problem is an incorrect pair of parentheses in a preg_replace expression in SearchUpdate.php::doUpdate(). The attached patch file removes those parentheses; I also did some secondary cleanup of the expression by deleting some redundant chunks ("[A-Za-z0-9]*\\s*" is all covered equally well by "[^>]*?", and the simpler expression doesn't mislead editors). The revised regexp successfully processes UESPWiki:Mirror_Plan, and also successfully processes some test pages containing html tags.
Thanks, patch applied in r58548, along with some tests.