Last modified: 2014-09-24 01:19:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31025, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29025 - Magic links are inconsistent with common parser rules
Magic links are inconsistent with common parser rules
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review
Depends on:
Blocks: 28950 29473
  Show dependency treegraph
 
Reported: 2011-05-17 15:22 UTC by The Evil IP address
Modified: 2014-09-24 01:19 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Accept only spaces separating magic links to RFC/PMID/ISBN (848 bytes, patch)
2011-05-17 20:56 UTC, Platonides
Details
Failing pages from enwiki-20110405 (39.96 KB, text/plain)
2011-05-27 22:11 UTC, Platonides
Details
patch proposal for Bug 28950 and Bug 29025 (3.61 KB, patch)
2012-01-23 13:28 UTC, vlakoff
Details

Description The Evil IP address 2011-05-17 15:22:58 UTC
The behavior of line breaks when it comes to magic links, like RFC or PMID, is different from the usual behavior of line breaks in wiki syntax. If there are more than two linebreaks between the lines, they will be put on different lines. However, if they are magic links, then they will still stay magic links, even with line breaks in between. That's quite unlikely the intention, and I would probably suggest to only allow regular spaces as separators of the "RFC"/"PMID" strings and the numbers. See http://test.wikipedia.org/wiki/Magic_links for some examples I've given.
Comment 1 Platonides 2011-05-17 20:56:45 UTC
Created attachment 8546 [details]
Accept only spaces separating magic links to RFC/PMID/ISBN

You are right. The magic links reges use \s which is equivalent to [\t\n\f\r ].
Just using spaces would have been enough. The other characters were added in r15976, and I don't think they were intended to be supported.
The above patch limits magic links to the space character. I would like checking its actual usage before applying, since they had been accepted the last 5 years.
Comment 2 Ilmari Karonen 2011-05-20 10:33:32 UTC
Looks good to me.
Comment 3 Platonides 2011-05-27 22:11:05 UTC
Created attachment 8595 [details]
Failing pages from enwiki-20110405

I have tested the 11120931 revisions of enwiki-20110405-pages-articles against the regex
'!(?:                           # Start cases
        (?:RFC|PMID)[\t\n\f\r]+([0-9]+) |   # m[4]: RFC or PMID, capture number
        ISBN[\t\n\f\r]+(\b                  # m[5]: ISBN, capture number
                (?: 97[89] [\ \-]? )?   # optional 13-digit ISBN prefix
                (?: [0-9]  [\ \-]? ){9} # 9 digits with opt. delimiters
                [0-9Xx]                 # check digit
                \b)
)!x'
748 pages would lose a magic linking.

If only 3645484 pages are articles, 748/3645484 = 0.2 per 1000.

I'd like to take a look into these articles, though.
Comment 4 Dan Collins 2011-07-09 21:52:28 UTC
Looks to me like the magic links on the test.wikipedia.org page are all now behaving correctly, and only those with spaces and not newlines are linked. Closing this fixed.
Comment 5 Dan Collins 2011-07-09 21:54:14 UTC
Never mind, I completely misinterpreted that test page...
Comment 6 vlakoff 2012-01-23 13:28:39 UTC
Created attachment 9895 [details]
patch proposal for Bug 28950 and Bug 29025

First patch proposal for Bug 28950 and Bug 29025. Seems to be working great, nevertheless any suggestion would be very welcome.

The benefits of this patch are:
- (Bug 28950) non-breaking spaces (both literal char and HTML entities) support
- (Bug 29025) no surprising link creation if several \n's (like "ISBN\n\n1234567890")


The only limitation I am aware of is that \n isn't implemented (yet), so for example "ISBN\n1234567890" doesn't produce a link. But don't forget cases like "ISBN \n123...", "ISBN\n 123..." (<pre> insertion!), "ISBN\n&nbsp;123...", and so on.

\n support is feasible, I don't know if it would be that useful, however I'd like to be as close as possible to "normal" wikicode parsing.
Comment 7 vlakoff 2012-01-23 21:00:00 UTC
Please see Bug 28950 for an updated patch of mine (and future ones if any).
Comment 8 Sumana Harihareswara 2012-01-29 14:17:32 UTC
Changed "reviewed" keyword to "need-review" to indicate that new patch awaits code review.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links