Last modified: 2014-09-24 01:19:02 UTC
The behavior of line breaks when it comes to magic links, like RFC or PMID, is different from the usual behavior of line breaks in wiki syntax. If there are more than two linebreaks between the lines, they will be put on different lines. However, if they are magic links, then they will still stay magic links, even with line breaks in between. That's quite unlikely the intention, and I would probably suggest to only allow regular spaces as separators of the "RFC"/"PMID" strings and the numbers. See http://test.wikipedia.org/wiki/Magic_links for some examples I've given.
Created attachment 8546 [details] Accept only spaces separating magic links to RFC/PMID/ISBN You are right. The magic links reges use \s which is equivalent to [\t\n\f\r ]. Just using spaces would have been enough. The other characters were added in r15976, and I don't think they were intended to be supported. The above patch limits magic links to the space character. I would like checking its actual usage before applying, since they had been accepted the last 5 years.
Looks good to me.
Created attachment 8595 [details] Failing pages from enwiki-20110405 I have tested the 11120931 revisions of enwiki-20110405-pages-articles against the regex '!(?: # Start cases (?:RFC|PMID)[\t\n\f\r]+([0-9]+) | # m[4]: RFC or PMID, capture number ISBN[\t\n\f\r]+(\b # m[5]: ISBN, capture number (?: 97[89] [\ \-]? )? # optional 13-digit ISBN prefix (?: [0-9] [\ \-]? ){9} # 9 digits with opt. delimiters [0-9Xx] # check digit \b) )!x' 748 pages would lose a magic linking. If only 3645484 pages are articles, 748/3645484 = 0.2 per 1000. I'd like to take a look into these articles, though.
Looks to me like the magic links on the test.wikipedia.org page are all now behaving correctly, and only those with spaces and not newlines are linked. Closing this fixed.
Never mind, I completely misinterpreted that test page...
Created attachment 9895 [details] patch proposal for Bug 28950 and Bug 29025 First patch proposal for Bug 28950 and Bug 29025. Seems to be working great, nevertheless any suggestion would be very welcome. The benefits of this patch are: - (Bug 28950) non-breaking spaces (both literal char and HTML entities) support - (Bug 29025) no surprising link creation if several \n's (like "ISBN\n\n1234567890") The only limitation I am aware of is that \n isn't implemented (yet), so for example "ISBN\n1234567890" doesn't produce a link. But don't forget cases like "ISBN \n123...", "ISBN\n 123..." (<pre> insertion!), "ISBN\n 123...", and so on. \n support is feasible, I don't know if it would be that useful, however I'd like to be as close as possible to "normal" wikicode parsing.
Please see Bug 28950 for an updated patch of mine (and future ones if any).
Changed "reviewed" keyword to "need-review" to indicate that new patch awaits code review.