Last modified: 2010-05-15 15:54:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16961, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14961 - importdump.php regular expression is too large
importdump.php regular expression is too large
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.12.x
PC Windows XP
: Normal major (vote)
: ---
Assigned To: Tim Starling
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-28 16:53 UTC by Radek Marik
Modified: 2010-05-15 15:54 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Radek Marik 2008-07-28 16:53:15 UTC
Running on XAMPP 2.5.
I tried to import 6500 pages exported from MediaWiki 1.9. After a while, roughly 3500 pages, the importdump.php reports many times:
Warning: preg_match(): Compilation failed: regular expression is too large at offset 29149 in C:\xampp\htdocs\wiki\includes\Preprocessor_DOM.php on line 205.

Additional notes:
The issue can be tracked to the variable $xmlishElements receiving its value from $this->parser->getStripList() on line 81. The parser accumulates more and more hooks with the same key ("ask") remembering all of them because it is implemented as a normal array not as as a associative array.
Comment 1 Chad H. 2008-07-28 17:19:41 UTC
I've hit this before. Extensions that add tags to the strip list there (ie: Cite with <ref>) tend to cause that regex to get too large. Might be worth breaking core parsing of those apart from extensions.
Comment 2 Brion Vibber 2008-07-28 23:55:00 UTC
Hmmmm, I thought we fixed this sort of problem previously? (Maybe that was hooks?) Perhaps some state isn't getting cleared properly...
Comment 3 Siebrand Mazeland 2008-08-11 08:14:56 UTC
Assigned to Tim, current expert on wiki dumps.
Comment 4 Aaron Schulz 2008-09-12 15:47:52 UTC
Code is not in the newer 1.13/1.14a
Comment 5 Tim Starling 2008-10-06 08:41:45 UTC
Was fixed by Brion in r32133. The reporter is using 1.12 which was branched at r31056. Please update to 1.13. 

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links