Last modified: 2008-09-24 15:18:00 UTC
If a page has more <ref>s than </ref>s or vice versa, that's a clue that something's wrong. If something *is* wrong, you're probably going to get either a huge chunk of text stuck in a reference or a reference stuck in the middle of a page, which are both bad (see the URL for a typical case of the former). Furthermore, people who aren't technically-minded will probably have not the slightest clue how to fix the problem--I've seen this happen twice. Since there's no legitimate reason anyone should ever want to save unbalanced <ref>s, I suggest an error message be presented when they try rather than allowing them to screw up the page.
Let me revise slightly. Not saving is maybe a bad idea, so how about this: if I enter foo<ref>bar baz<ref>whatever comes next</ref>, it outputs HTML like foo<ref><span style="font-size: large; font-weight: bold; color: red;">WARNING: No closing </ref> detected for this tag!</span>bar baz["whatever comes next" refs correctly] So in other words, instead of eating the page, ignore the unclosed tag and make it clear up-front that there's an error of some kind. (Title: "Prevent page save with unbalanced <ref></ref>s" -> "Fix behavior for unbalanced <ref></ref>s")
Note that this second suggestion appears to be how <ref> handles certain errors already, see for instance http://en.wikipedia.org/wiki/User:Simetrical/Let%27s_break_refs%21.
I've been informed that with the current parser this is impossible to implement fully. It could, however, be implemented partially by checking whether the reference contains the string "<ref>", which would indicate an error. For "text1<ref>ref1 text2<ref>ref2</ref>", desired output would probably be "text1[ERROR MESSAGE]ref1 text2[1]...1. ref2". I.e., text at the beginning of the ref until the *last* occurrence of "<ref>" is printed literally preceded by an error message, text following the <ref> is printed as a reference to minimize page-screwed-up-ness. (It should indeed be last "<ref>", if you think about it: <ref>s can't be legally nested, so any <ref>s previous to the last must also be unclosed.) Since the original reason for wontfixing (undesirability of proposed consequence) has been resolved, and the proposal as stands may be infeasible but is still ultimately possible and presumably desired, I'm reopening this (neither brion nor Rob cared enough to reopen it themselves, but they don't mind if I reopen it either).
Created attachment 1912 [details] Modified Cite.php I have not the foggiest clue what format exactly the patch is supposed to be in, and I don't see anything that tells me that fact, so I'm just putting up the whole PHP file. Attached patch will generate an error if $str contains the string '<ref>', but will work correctly otherwise. Ideally, this should print the contents of the bad <ref> rather than just sending an error, but this should be good enough for now (since it's unlikely it will go undetected for long, I hope, or be hard to fix).
Created attachment 1913 [details] An attempt at diff format, otherwise same as above.
Created attachment 1914 [details] Proposed patch #3 Is there a page somewhere on how to format diffs?
Created attachment 1915 [details] Attempt #4
Created attachment 1916 [details] Final attempt before waiting for help
Use the diff feature from your Subversion client.
Created attachment 1919 [details] Same patch as before I was trying to use the svn diff, but it previously didn't work because my command-line window was too small. After fidgeting with settings (these things are more obvious at 7 PM than 1 AM), I could copy the whole thing. It was too long before due to addition and removal of identical lines due to reformatting issues; I've trimmed these out.
Created attachment 1920 [details] Same patch as before Remove extraneous linebreaks that were causing parsing errors due to incidental hyphen at the beginning of a line.
Note: this patch will likely handle <nowiki> incorrectly, interpreting <nowiki><ref></nowiki> as an error if contained within a ref. A simple workaround would be to use <ref> instead, so the proposed patch is still probably better than nothing, but could be improved.
That would be clearly unacceptable, breaking thousands of pages.
Really? It would be that major? I can't see why people would want to put the string "<ref>" inside a ref except, possibly, at [[m:Cite.php]]. I'll fix it, then.
<nowiki><ref></nowiki> needs to work. Uhhh, but if it's inside a <ref>, who knows. Never mind. :D
Created attachment 2030 [details] New patch New patch fixes the problem with nowiki.
That regex looks pretty fragile. What about <!-- <ref> --> etc?
True. Actually, I can't have tested that; '/<nowiki>.*?</' is obviously going to break. I'll test this and submit a revised version. Probably '%<nowiki>.*?</nowiki>|<!--.*?-->|<pre>.*?</pre>%i' or something is what I want. Or maybe I could exclude a <ref> that's included between *any* tags: that won't muck with parser functions, for instance, that for God knows what unholy reason might want a literal "<ref>", but it will still catch common errors. Plus, the new version can benefit from brilliant programming techniques I've picked up in the last two months, such as the existence of preg_match. :D
Created attachment 2288 [details] Cautious proposed patch This patch ignores any <ref> inside either a comment or any SGML-like tag. This is perhaps overcautious, but it will still catch a large majority of errors like this without any chance of false positives that I can think of.
See also bug 8693 for a related issue.
*** Bug 12757 has been marked as a duplicate of this bug. ***
Duplicate bug 12757 also contains some comments and a patch.
Committed essentially the patch from comment 19 in r40998. Ugly, but better than leaving this around for another two years . . .
(see also r40999)
I think I found a small bug: If the missing </ref>-tag is the last ref tag, no error message is displayed. Works correct: <ref>broken <ref>correct</ref> <references /> Doesn't work : <ref>broken <references /> See http://en.wikipedia.org/w/index.php?oldid=239749551 http://en.wikipedia.org/w/index.php?oldid=239749592 for examples.
(In reply to comment #25) > I think I found a small bug: If the missing </ref>-tag is the last ref tag, no > error message is displayed. > Works correct: <ref>broken <ref>correct</ref> <references /> > Doesn't work : <ref>broken <references /> That's because parser doesn't send unclosed tags to extensions.
(In reply to comment #25) > I think I found a small bug: If the missing </ref>-tag is the last ref tag, no > error message is displayed. > Works correct: <ref>broken <ref>correct</ref> <references /> > Doesn't work : <ref>broken <references /> > See http://en.wikipedia.org/w/index.php?oldid=239749551 > http://en.wikipedia.org/w/index.php?oldid=239749592 for examples. Please file a new bug, and link to it from here. (In reply to comment #26) > That's because parser doesn't send unclosed tags to extensions. It looks like it's sending the tag, just implicitly closing it at the end of the article. Either way, I don't see an obvious workaround for this.