Last modified: 2008-09-24 15:18:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8199, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 6199 - Generate error for unbalanced <ref></ref>s
Generate error for unbalanced <ref></ref>s
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Cite (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/w/index.php?t...
: patch, patch-need-review
: 12757 (view as bug list)
Depends on:
Blocks: 15712
  Show dependency treegraph
 
Reported: 2006-06-04 22:39 UTC by Aryeh Gregor (not reading bugmail, please e-mail directly)
Modified: 2008-09-24 15:18 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Modified Cite.php (20.96 KB, patch)
2006-06-08 03:11 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
An attempt at diff format, otherwise same as above. (21.00 KB, patch)
2006-06-08 03:30 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Proposed patch #3 (2.01 KB, patch)
2006-06-08 03:47 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Attempt #4 (2.00 KB, patch)
2006-06-08 03:49 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Final attempt before waiting for help (1.85 KB, patch)
2006-06-08 03:52 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Same patch as before (3.11 KB, patch)
2006-06-08 23:06 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Same patch as before (2.70 KB, patch)
2006-06-08 23:11 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
New patch (1.84 KB, patch)
2006-06-30 01:41 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details
Cautious proposed patch (1.55 KB, patch)
2006-09-01 03:59 UTC, Aryeh Gregor (not reading bugmail, please e-mail directly)
Details

Description Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-04 22:39:07 UTC
If a page has more <ref>s than </ref>s or vice versa, that's a clue that
something's wrong.  If something *is* wrong, you're probably going to get either
a huge chunk of text stuck in a reference or a reference stuck in the middle of
a page, which are both bad (see the URL for a typical case of the former). 
Furthermore, people who aren't technically-minded will probably have not the
slightest clue how to fix the problem--I've seen this happen twice.  Since
there's no legitimate reason anyone should ever want to save unbalanced <ref>s,
I suggest an error message be presented when they try rather than allowing them
to screw up the page.
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-05 02:24:15 UTC
Let me revise slightly.  Not saving is maybe a bad idea, so how about this: if I
enter foo<ref>bar baz<ref>whatever comes next</ref>, it outputs HTML like

foo&lt;ref&gt;<span style="font-size: large; font-weight: bold; color:
red;">WARNING: No closing &lt;/ref&gt; detected for this tag!</span>bar
baz["whatever comes next" refs correctly]

So in other words, instead of eating the page, ignore the unclosed tag and make
it clear up-front that there's an error of some kind.

(Title: "Prevent page save with unbalanced <ref></ref>s" -> "Fix behavior for
unbalanced <ref></ref>s")
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-06 01:11:14 UTC
Note that this second suggestion appears to be how <ref> handles certain errors
already, see for instance
http://en.wikipedia.org/wiki/User:Simetrical/Let%27s_break_refs%21.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-07 00:35:16 UTC
I've been informed that with the current parser this is impossible to implement
fully.  It could, however, be implemented partially by checking whether the
reference contains the string "<ref>", which would indicate an error.  For
"text1<ref>ref1 text2<ref>ref2</ref>", desired output would probably be
"text1[ERROR MESSAGE]ref1 text2[1]...1. ref2".  I.e., text at the beginning of
the ref until the *last* occurrence of "<ref>" is printed literally preceded by
an error message, text following the <ref> is printed as a reference to minimize
page-screwed-up-ness.  (It should indeed be last "<ref>", if you think about it:
<ref>s can't be legally nested, so any <ref>s previous to the last must also be
unclosed.)

Since the original reason for wontfixing (undesirability of proposed
consequence) has been resolved, and the proposal as stands may be infeasible but
is still ultimately possible and presumably desired, I'm reopening this (neither
brion nor Rob cared enough to reopen it themselves, but they don't mind if I
reopen it either).
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 03:11:12 UTC
Created attachment 1912 [details]
Modified Cite.php

I have not the foggiest clue what format exactly the patch is supposed to be
in, and I don't see anything that tells me that fact, so I'm just putting up
the whole PHP file.

Attached patch will generate an error if $str contains the string '<ref>', but
will work correctly otherwise.	Ideally, this should print the contents of the
bad <ref> rather than just sending an error, but this should be good enough for
now (since it's unlikely it will go undetected for long, I hope, or be hard to
fix).
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 03:30:39 UTC
Created attachment 1913 [details]
An attempt at diff format, otherwise same as above.
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 03:47:55 UTC
Created attachment 1914 [details]
Proposed patch #3

Is there a page somewhere on how to format diffs?
Comment 7 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 03:49:43 UTC
Created attachment 1915 [details]
Attempt #4
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 03:52:34 UTC
Created attachment 1916 [details]
Final attempt before waiting for help
Comment 9 Rob Church 2006-06-08 12:21:15 UTC
Use the diff feature from your Subversion client.
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 23:06:44 UTC
Created attachment 1919 [details]
Same patch as before

I was trying to use the svn diff, but it previously didn't work because my
command-line window was too small.  After fidgeting with settings (these things
are more obvious at 7 PM than 1 AM), I could copy the whole thing.  It was too
long before due to addition and removal of identical lines due to reformatting
issues; I've trimmed these out.
Comment 11 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-08 23:11:20 UTC
Created attachment 1920 [details]
Same patch as before

Remove extraneous linebreaks that were causing parsing errors due to incidental
hyphen at the beginning of a line.
Comment 12 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-28 03:16:56 UTC
Note: this patch will likely handle <nowiki> incorrectly, interpreting
<nowiki><ref></nowiki> as an error if contained within a ref.  A simple
workaround would be to use &lt;ref&gt; instead, so the proposed patch is still
probably better than nothing, but could be improved.
Comment 13 Brion Vibber 2006-06-28 17:53:50 UTC
That would be clearly unacceptable, breaking thousands of pages.
Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-28 17:55:31 UTC
Really?  It would be that major?  I can't see why people would want to put the
string "<ref>" inside a ref except, possibly, at [[m:Cite.php]].  I'll fix it, then.
Comment 15 Brion Vibber 2006-06-28 18:09:51 UTC
<nowiki><ref></nowiki> needs to work.

Uhhh, but if it's inside a <ref>, who knows. Never mind. :D
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-30 01:41:36 UTC
Created attachment 2030 [details]
New patch

New patch fixes the problem with nowiki.
Comment 17 Brion Vibber 2006-08-31 17:17:56 UTC
That regex looks pretty fragile. What about <!-- <ref> --> etc?
Comment 18 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-31 21:15:10 UTC
True.  Actually, I can't have tested that; '/<nowiki>.*?</' is obviously going
to break.  I'll test this and submit a revised version.  Probably
'%<nowiki>.*?</nowiki>|<!--.*?-->|<pre>.*?</pre>%i' or something is what I want.
 Or maybe I could exclude a <ref> that's included between *any* tags: that won't
muck with parser functions, for instance, that for God knows what unholy reason
might want a literal "<ref>", but it will still catch common errors.

Plus, the new version can benefit from brilliant programming techniques I've
picked up in the last two months, such as the existence of preg_match.  :D
Comment 19 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-09-01 03:59:34 UTC
Created attachment 2288 [details]
Cautious proposed patch

This patch ignores any <ref> inside either a comment or any SGML-like tag. 
This is perhaps overcautious, but it will still catch a large majority of
errors like this without any chance of false positives that I can think of.
Comment 20 Paul Robinson 2007-02-15 20:19:44 UTC
See also bug 8693 for a related issue.
Comment 21 Jelte (WebBoy) 2008-08-11 18:28:03 UTC
*** Bug 12757 has been marked as a duplicate of this bug. ***
Comment 22 Jelte (WebBoy) 2008-08-11 18:29:14 UTC
Duplicate bug 12757 also contains some comments and a patch.
Comment 23 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-09-18 17:16:48 UTC
Committed essentially the patch from comment 19 in r40998.  Ugly, but better than leaving this around for another two years . . .
Comment 24 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-09-18 17:21:32 UTC
(see also r40999)
Comment 25 Church of emacs 2008-09-20 08:11:52 UTC
I think I found a small bug: If the missing </ref>-tag is the last ref tag, no error message is displayed.
Works correct: <ref>broken <ref>correct</ref> <references />
Doesn't work : <ref>broken <references />
See http://en.wikipedia.org/w/index.php?oldid=239749551 http://en.wikipedia.org/w/index.php?oldid=239749592 for examples.
Comment 26 Max Semenik 2008-09-20 08:17:13 UTC
(In reply to comment #25)
> I think I found a small bug: If the missing </ref>-tag is the last ref tag, no
> error message is displayed.
> Works correct: <ref>broken <ref>correct</ref> <references />
> Doesn't work : <ref>broken <references />

That's because parser doesn't send unclosed tags to extensions.
Comment 27 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-09-21 00:48:59 UTC
(In reply to comment #25)
> I think I found a small bug: If the missing </ref>-tag is the last ref tag, no
> error message is displayed.
> Works correct: <ref>broken <ref>correct</ref> <references />
> Doesn't work : <ref>broken <references />
> See http://en.wikipedia.org/w/index.php?oldid=239749551
> http://en.wikipedia.org/w/index.php?oldid=239749592 for examples.

Please file a new bug, and link to it from here.

(In reply to comment #26)
> That's because parser doesn't send unclosed tags to extensions.

It looks like it's sending the tag, just implicitly closing it at the end of the article.  Either way, I don't see an obvious workaround for this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links