Last modified: 2014-09-23 23:09:06 UTC
The ISBN/RFC/whatever magic linking voodoo really ought not be in core. It's a goofy feature that dates way, way back to r40. I suggest splitting the magic linking voodoo out into an extension and adding a genericized functionality that could resolve bugs like bug 224.
I don't think the magic linking in core is necessarily evil. However, if it currently isn't, then it should be made configurable and disableble.
Either you would have to manage to keep the syntax identical or you will have to come up with some transition plan for updating what must by now be millions of instances of the old syntax to whatever new syntax you choose.
(In reply to comment #2) Nobody has proposed to change the syntax. Only that the current usage might be moved into an extension, where also new things similar to this could be added (although I would recommended against it).
(In reply to comment #3) > (In reply to comment #2) > Nobody has proposed to change the syntax. Only that the current usage might be > moved into an extension, where also new things similar to this could be added > (although I would recommended against it). Actually, Aryeh has suggested eliminating the syntax in favor of templates (e.g. {{ISBN|XXXXX}}). A few other bugs in this tracker have suggested using ParserFunction-type syntax (e.g. {{#ISBN:XXXX}}). An extension has the advantage of being able to be installed across all wikis easily, rather than relying on users to create and maintain templates (there are still no global templates). An extension also has an inherent disable/enable feature (in reply to comment #1). If you were creating this feature today, you'd probably do it in JavaScript (or at least I would). Whether Wikimedia or any individual wiki would install this extension to replace the functionality is not as important as the current system degrades gracefully. There are a lot of options to move forward here, but I think the first fundamental question to answer is whether this functionality is appropriate for core. I'm not sure what you're recommending against: adding more cases of magic link syntax or moving this functionality into an extension.
The first, adding more magic link syntax. It's also open question if we even can move this to an extension without causing too big disruption.
We could provide a way to hook into that piece of the parser and add the current ones to something called CoreMagicLinking, just as was done with parserfunctions. Such thing is purposefully not hookable for being evil.
Created attachment 8582 [details] Magical Extension It does seem to me that having magic linking in the core is inappropriate for a number of reasons, many noted above. Attached is a simple extension that replaces the core functionality. I have tested it with a modified MediaWiki that lacks the magic linking in the core. A few notes about the extension: - it easily supports "legacy" magic linking (i.e. ISBN, RFC, PMID) via $wgMagicalClassic - it can be easily extended (example below) - the code is a little more general, and less hackey :) An example of a custom linker (I have tested this in my LocalSettings.php): function myCustomLinker( $matches ) { $num = ltrim($matches[1], '0'); $url = 'http://bugzilla.wikimedia.org/show_bug.cgi?id=' . $num; return '<a href=\'' . $url . '\'>' . $matches[0] . '</a>'; } $wgMagicalLinkers[] = array( 'linker' => 'myCustomLinker', 'pattern' => '@bug ([0-9]+)@' ); The name of the extension is "Magical".
Seems to do the job. Although I'm not convinced about the efficiency of using preg_match_all. Thanks for your contribution. Some notes: Do not use == null to test if the entry is present, that will produce a notice. Use isset() instead. The name of the main function (Magical::pbt) should be much clear. ISBN is not normalised. You are not providing the css classes. The callbacks are safe thanks to the limited scope of the regex. Maybe add some redundant htmlspecialchars? If you want to consider a design change, I would have used per-parser state added in ParserFirstCallInit, like parser functions and tag hooks, probably making a single combined regex there (so you don't need to scan the text N times).
Created attachment 8584 [details] Magical Extension (rv2) Thanks for the feedback, Platonides. I fixed many of the things you mentioned. (I wasn't quite sure what you meant by "redundant htmlspecialchars".) Regarding performance: I did a quick and dirty test, and my version seems to go approximately two times as fast as the core version. While I am pleased with this result, I have no idea why it is the case... If you get a chance, it would be great if some could double check the performance.
I did some additional testing, and my version is significantly worse for certain types of input. I'll do some more hacking...
Created attachment 8585 [details] Magical Extension (rv3) This should fix performance problems considerably. It is now comparable performance wise with core magic linking.
You are not doing it in the structured way of placing extensions into files (no problem, it's easy to add later), but things like if( ! isset( $wgMagicalLinkers ) ) are since they produce a vulnerability if the host is configured with register globals. Also, you should check that MEDIAWIKI is defined before executing anything. It makes no sense having this line: $wgMagicalLinkers = array_reverse( $wgMagicalLinkers ); Magical::createPattern() should be lazy-loaded, or done by the function if there's a function for registering magic words. Using Xml::escapeTagsOnly is worse than doing no escaping at all. With no escaping, it looks like a html injection vector but when looking at the regex it is actually secure (my previous mention to redundant calls to htmlspecialchars) http://www.mediawiki.org/wiki/Security_for_developers#Demonstrable_security Escaping with Xml::escapeTagsOnly it looks like it is secure, so that would encourage eg. changing the regex for a more insecure one, as it is escaped, but the delimiter you use is ' but escape for ". A comment on the $unsafePattern would be appropiate. It's quite clever. Note that you no longer need to use the x modifier. Also, I would use / instead of @ as it is more common. The fact that @ has to be escaped on those regex may bite some people.
Created attachment 8588 [details] Magical Extension (rv4) I think this fixes most of your concerns. Many thanks for your extensive feedback. :)
Created attachment 8589 [details] Disable magic linking in core (temporary) In case anyone is interested in testing this extension out, here's a quick patch to disable magic linking in Parser.php.
Removing 1378, 2340, 2391, 3663, 4595, 6535, 8160, 8758, 10866, 10867, 12900, 13195, 13873, 18813, 18814, 19439, 20441, 26644, 28950, 29025 as depends on... they have nothing to do with moving magic linker to a extension.
-tracking, there is nothing tracking about this bug. +patch.
@William Demchick: do you currently have commit access? it might be handy (and get a few more eyes on it) if you work on it directly in svn.
Created attachment 8918 [details] Magical Extension (rv5) with 30360 Sample patch based on rv4 to include [[Bugzilla:30360]] (please double check).
*** Bug 30360 has been marked as a duplicate of this bug. ***
Docu, could you please attach your patch as a plaintext diff per https://www.mediawiki.org/wiki/Patch#Posting_a_patch ? Thank you.
(In reply to comment #20) > Docu, could you please attach your patch as a plaintext diff per > https://www.mediawiki.org/wiki/Patch#Posting_a_patch ? Thank you. Since it's a new file / new extension that isn't really necessary.
Docu, some links that might help you with this going forward: https://www.mediawiki.org/wiki/Developer_access https://www.mediawiki.org/wiki/Git/Workflow#How_to_submit_a_patch https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment Thanks.
Hi Docu, thank you for the patch! As you may already know, MediaWiki is currently revamping its PHP-based parser into a "Parsoid" prototype component, to support the rich-text Visual Editor project: https://www.mediawiki.org/wiki/Parsoid https://www.mediawiki.org/wiki/Visual_editor Folks interested in enhancing the parser's capabilities are very much welcome to join the Parsoid project, and contribute patches as Git branches: https://www.mediawiki.org/wiki/Git/Tutorial#How_to_submit_a_patch Compared to .diff attachments in Bugzilla tickets, Git branches are much easier for us to review, refine and merge features together. Each change set has a distinct URL generated by the "git review" tool, which can be referenced in Bugzilla by pasting its gerrit.wikimedia.org URL as a comment. If you run into any issues with the patch process, please feel free to ask on irc.freenode.net #wikimedia-dev and the wikitext-l mailing list. Thank you!
(In reply to comment #1) > I don't think the magic linking in core is necessarily evil. However, if it > currently isn't, then it should be made configurable and disableble. This got split out to bug 45942 ("Magic links" RFC, PMID and ISBN should be gated by configuration).