Last modified: 2010-07-24 18:23:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T24905, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 22905 - Parser.php doMagicLinks() mishandle abbr tag


Summary:	Parser.php doMagicLinks() mishandle abbr tag

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/wiki/User:Gui...
Whiteboard:
Keywords:	patch, patch-need-review

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-03-20 05:08 UTC by Solitarius
Modified:	2010-07-24 18:23 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
regular expression modification (815 bytes, patch) 2010-03-20 05:13 UTC, Solitarius	Details
Add an attachment (proposed patch, testcase, etc.)

Description Solitarius 2010-03-20 05:08:56 UTC

With the white listing of the <abbr>, the function doMagicLinks() of Parser.php mix <a> and <abbr> together.

Comment 1 Solitarius 2010-03-20 05:13:01 UTC

Created attachment 7235 [details]
regular expression modification

Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-03-23 16:41:01 UTC

1) Do you have a test case that demonstrates the problem?  I.e., what's some markup that parses incorrectly because of this bug?

2) Your change doesn't seem quite right -- whitespace other than a simple space would be valid HTML here (although I haven't looked closely enough to see if it would actually be possible at this stage in the parsing).  I would suggest (<a[^a-z0-9].*?</a>).

Comment 3 Solitarius 2010-03-24 04:17:26 UTC

1) The wiki markup bellow get incorrectly parsed. You can also check [[User:GuillaumeBeaudoin]] for more example.

<abbr>(fr)</abbr> ISBN 2753300917 [http://bit.ly/bZAjtg La méthode Google]

The <abbr> tag is extensively used on the French wikipedia and the issue have been first found on [[fr:Wikipedia]] by [[fr:User:Manu1400]].

2) You're right, a tab or any whitespace other than a simple space would not make good on my regular expression. We could use \s for any whitespaces (option A). The one likes what you've proposed (option B).

Option A - <a[\w>].*?</a>
Option B - <a[^a-zA-Z0-9].*?</a>
Option C - <a[^[:alnum:]].*?</a>

Altough, I'm not sure what capital letters would do.

Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-03-24 13:44:13 UTC

Committed a modified version in r64113.  I went with (<a[ \t\r\n>].*?</a>) in the end, matching the HTML5 spec as far as I'm reading it: <http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#before-attribute-name-state>  Thanks for the patch!

Comment 5 Solitarius 2010-03-25 05:27:38 UTC

Thanks you Aryeh. Merci!

Comment 6 S. McCandlish 2010-07-24 18:22:41 UTC

Since this is fixed, removing Bug #617 as a "blocks" dependency.

Comment 7 S. McCandlish 2010-07-24 18:23:52 UTC

Woops, typo. Corrected: Since this is fixed, removing Bug #671 as a "blocks" dependency.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links