Last modified: 2014-07-15 07:49:40 UTC
Nov 20 18:43:57 <arlolra> the max recurse from jawiki/中央線_(韓国) Nov 20 18:44:32 <arlolra> {{中央線経路図}} seems to be from this template Nov 20 18:45:28 <arlolra> just trying to parse that alone causes it Nov 20 18:45:53 <arlolra> but parsing the content of the template works just fine ... Nov 20 18:49:03 <subbu> confirmed by this: Nov 20 18:49:04 <subbu> [subbu@earth lib] echo '{{中央線経路図}}' | node parse --prefix jawiki Nov 20 18:49:04 <subbu> ERROR in Main_Page: Nov 20 18:49:04 <subbu> Maximum call stack size exceeded Nov 20 18:49:04 <subbu> Stack trace: undefined
The source of that template is *really* big and the error seems to be happening in tokenizer. It's not an infinite recursion cause this works, echo "{{中央線経路図}}" | node --stack-size=2048 parse --prefix jawiki
Interesting ... it would be worthwhile to figure out if we can get this to parse without getting into deep call stacks. But, definitely lower priority for now.
Couple more pages from production logs: * itwiki:IV_Copa_Brasil oldid:64293916 * dewiki:Präsidentschaftswahl_in_Frankreich_2012 oldid:126550548
The bulk of the recursion is via table_data_tag calling nested_block_in_table, which in turn matches other table syntax. The remainder of the table is matched using this recursion, which causes the overflow as this table is large.
Change 135992 had a related patch set uploaded by GWicke: Bug 57670: Avoid recursion via nested_block_in_table https://gerrit.wikimedia.org/r/135992
Debugging note: Node 0.11 finally has stack traces on stack overflow. To analyze recursions, it's additionally useful to increase the stack trace limit: node --stack-trace-limit=1000
(In reply to ssastry from comment #3) > Couple more pages from production logs: > > * itwiki:IV_Copa_Brasil oldid:64293916 Now finishes in 10s & looks correct. > * dewiki:Präsidentschaftswahl_in_Frankreich_2012 oldid:126550548 Now finishes in 35s & looks correct.
Change 135992 merged by jenkins-bot: Bug 57670: Avoid recursion via nested_block_in_table https://gerrit.wikimedia.org/r/135992
New errors from production logs: [fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651050] Maximum call stack size exceeded [fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651060] Maximum call stack size exceeded [fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651091] Maximum call stack size exceeded [fatal][eswikisource/Usuario:Cárdenas/PRUEBAS?oldid=651094] Maximum call stack size exceeded
Seems to work fine with both node 0.10 and 0.11: [info][eswikisource/Usuario:Cárdenas/PRUEBAS] starting parsing [info][eswikisource/Usuario:Cárdenas/PRUEBAS] completed parsing in 2094 ms
@gwicke: did you try with the right oldid?
(In reply to Arlo Breault from comment #11) > @gwicke: did you try with the right oldid? I assumed that the issue was still there in the latest revision, which evidently wasn't true. I now got a stack trace with node 0.11. This is the loop: at peg$parsetemplate (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6695:26) at peg$parsetplarg_or_template (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6315:20) at peg$parsetplarg_or_template_or_broken (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6337:12) at peg$parseinline_element (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:3418:16) at peg$parseinlineline (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:3257:18) at peg$parseblock (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:2269:20) at peg$parsenested_block (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:2330:14) at peg$parsetemplate_param_text (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:7297:18) at peg$parsetemplate_param_name (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:7169:14) at peg$parsetemplate_param (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6985:12) at peg$parsetemplate (eval at <anonymous> (/home/gabriel/src/parsoid/lib/mediawiki.tokenizer.peg.js:80:38), <anonymous>:6695:26)
Looking at the source at http://es.wikisource.org/w/index.php?title=Usuario:C%C3%A1rdenas/PRUEBAS&action=edit&oldid=651094, there's *a lot* of unclosed template calls, which means that the tokenizer is (correctly) trying to parse this as deeply nested templated parameters. The PHP parser handles this page without crashing. The reason for this is that it sets $wgMaxTemplateDepth = 40; by default, and aborts a recursion beyond that point without crashing. I'll see if I can add a similar mechanism in the tokenizer.
Change 138763 had a related patch set uploaded by GWicke: Bug 57670: Limit template expansion depth similar to the PHP parser https://gerrit.wikimedia.org/r/138763
Change 138763 had a related patch set uploaded by Arlolra: WIP: Limit template expansion depth similar to the PHP parser https://gerrit.wikimedia.org/r/138763
Change 145683 had a related patch set uploaded by Arlolra: Allow backtracking in async tokenization https://gerrit.wikimedia.org/r/145683
Change 138763 merged by jenkins-bot: Limit template expansion depth similar to the PHP parser https://gerrit.wikimedia.org/r/138763
Change 145683 merged by jenkins-bot: Allow backtracking in async tokenization https://gerrit.wikimedia.org/r/145683