Last modified: 2007-06-10 02:06:01 UTC
The problem is that Parser::magicISBN() is called after Linker::makeThumbLinkObj has been run on the text, which means that the magicISBN function will expand on any ISBN number it recognizes, including those inside XHTML tags like alt="" or title="", the same thing happens with <math>. For example, "[[Image:Badimg.jpeg|thumb|<math>2+2</math>]]" is turned into: <div class="thumb tright"><div style="width:202px;"><a href="/mw/HEAD/wiki/Image:Badimg.jpeg" class="internal" title="<span class="texhtml>2 + 2</span>"><img src="/mw/HEAD/images/thumb/9/98/200px-Badimg.jpeg" alt="<span class="texhtml>2 + 2</span>" width="200" height="300" longdesc="/mw/HEAD/wiki/Image:Badimg.jpeg" /></a> <div class="thumbcaption" ><div class="magnify" style="float:right"><a href="/mw/HEAD/wiki/Image:Badimg.jpeg" class="internal" title="Enlarge"><img src="/mw/HEAD/skins/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div><span class="texhtml">2 + 2</span></div></div></div>
*** Bug 1442 has been marked as a duplicate of this bug. ***
*** Bug 834 has been marked as a duplicate of this bug. ***
What should [[Image:Badimg.jpeg|thumb|<math>2+2</math>]] properly turn into? That should make it easier to fix.
Checked my own copy, I don't see this bug.
David Majnemer: The same as it does now, except things in title="" should always be stripped of [<>"]
Copying my comment from bug 1442, since you've marked the duplicates this way round: > > The problem is that Parser::magicISBN() is called after Linker::makeThumbLinkObj > > has been run on the text, which means that the magicISBN function will expand on > > any ISBN number it recognizes, including those inside XHTML tags like alt="" or > > title="". > > In that case, it should be possible to do things the other way round, since the > image code should strip the HTML tags back out of the alt/title attributes. Kind > of ugly, but I think it might work.
Ok, now that I know what should be done, I can fix it.
The only way to get this stupid bug gone is we set it up so it omited the value. That.. can be solved by restructuring some code (the proper way to do it) or the hacky way of adding an omit to the unstip function. I will try both and give patches so y'all got a choice ;)
Fixed Math :)
Created attachment 442 [details] Patch for math attached. ;) Patch for math attached. ;) This patch is ONLY for math, I have not done ISBN yet, I will do so when somebody tells me what is the correct behaviour for ISBN. :)
For ISBN: input: [[Image:Example.jpg|thumb|ISBN 0123456789]] current output: <div class="thumb tright"> <div style="width: 182px;"><a href="/wiki/Image:Example.jpg" title="<a href=" class="internal">ISBN 0123456789</a>"><img src="http://upload.wikimedia.org/wikipedia/en/thumb/f/fd/180px-Example.jpg" alt="<a href=" class="internal">ISBN 0123456789" width="180" height="171" longdesc="/wiki/Image:Example.jpg" /> <div class="thumbcaption"> <div class="magnify" style="float: right;"><a href="/wiki/Image:Example.jpg" class="internal" title="Enlarge"><img src="/skins/common/images/magnify-clip.png" alt="Enlarge" height="11" width="15"></a></div> <a href="/w/index.php?title=Special:Booksources&isbn=0123456789" class="internal">ISBN 0123456789</a></div> </div> </div> correct output: <div class="thumb tright"> <div style="width: 182px;"><a href="/wiki/Image:Example.jpg" class="internal" title="ISBN 0123456789"><img src="http://upload.wikimedia.org/wikipedia/en/thumb/f/fd/180px-Example.jpg" alt="ISBN 0123456789" longdesc="/wiki/Image:Example.jpg" height="171" width="180"></a> <div class="thumbcaption"> <div class="magnify" style="float: right;"><a href="/wiki/Image:Example.jpg" class="internal" title="Enlarge"><img src="/skins/common/images/magnify-clip.png" alt="Enlarge" height="11" width="15"></a></div> <a href="/w/index.php?title=Special:Booksources&isbn=0123456789" class="internal">ISBN 0123456789</a></div> </div> </div>
I am on it!
(In reply to comment #10) > This patch is ONLY for math, I have not done ISBN yet, I will do so when > somebody tells me what is the correct behaviour for ISBN. :) Unfortunately, although only small, there are 2 errors in this patch: first, note that there is both $label *and* $alt; we don't mind <math></math> in the caption (i.e. $label), only in the attributes, so we only want to extract it in $alt; secondly (and I don't blame you for missing this one), there is duplicated code between makeImageLinkObj() and makeThumblinkObj() - really, code like this should be in both, or rather, should be in a part of code that's not spread across both. [I started rewriting that whole slab of code, but never quite finished] The "correct behaviour for ISBN" is surely just nothing - that is, within the alt/title attributes, the text "ISBN 0123456789X" should stay exactly as it is; but obviously there's no real way of doing that, because it will be picked up by the magicISBN() function later whether we like it or not. I guess you could use a placeholder of some sort, but these have problems of their own. In fact, at one stage the whole image was being stripped out, presumably causing certain things not to happen even in the *caption* part, and causing bug 1217 and bug 1219 The other alternative, as I say, is just to parse such things before the internalLink code has run, so that they will be HTML, and stripped from the alt/title attributes anyway. I've stayed up too late, so my mind's whirling too much to work out if that would cause any problems of its own or not...
Rowan, thanks for the tips. However, I will fix the ISBN function :)
(In reply to comment #14) > Rowan, thanks for the tips. However, I will fix the ISBN function :) I'll be interested to see what approach you take.
Created attachment 443 [details] Pwned ISBN FIXED! ISBN is fixed with two simple regexs. :)
Created attachment 444 [details] Fixed my math patch Fixed my math patch, to the best of my knowledge, the effects of this bug are gone.
Created attachment 445 [details] parserTest This patches the parser test file. Have fun.
(In reply to comment #16) > Created an attachment (id=443) [edit] > Pwned ISBN > > FIXED! > ISBN is fixed with two simple regexs. :) I might be wrong, but won't this only fix captions whose entire content is an ISBN string? i.e. it looks to be matching things like <<alt="ISBN 0123456789X">>, but not things like <<alt="This is a book which has the ISBN 0123456789X; which is nice">> Unfortunately, my 1.5 test install is in a state of development where it can't actually display images, so I've only got the code to go from. It also seems to me that this could potentially trigger false positives in the actual text, but I guess it's not a piece of text that people are very likely to use (extending the regex to cover foo="phrase with an ISBN etc in it" *might* run the risk of picking up too much, though, I'm not sure)
Created attachment 449 [details] ISBN AGAIN Fixed ISBN again. It is totally lazy and works great. It is designed to be smart enough to to mistake user input and an actual ISBN. Still two regexs. This should fix the problem with other words.
Created attachment 456 [details] Rewrote the whole function I rewrote the whole function, I thought the way it did things was silly and over complicated so I wrote a new function.
Regarding the "parserTest" attachment: * You made some minor formatting change to "Interwiki link encoding conversion (bug 1636)" * You damaged the "Inter-language links" test by changing "zh:食品" to "zh:??" (two question marks), please make sure your editor doesn't make changes like that to characters it doesn't recognize. Regarding the "Rewrote the whole function" attachment: * Please don't make minor whitespace changes, it makes the diff hard to read.
Created attachment 460 [details] Fixed uglyness It is nolonger ugly as hell.
Created attachment 461 [details] fixed my ugly test patch Fixed my horrible patch, this one is nicer.
Created attachment 462 [details] RAWR! Fixed the stupid thing again! Damn you world!
It doesn't work for more than one ISBN link per line, the following fails: """ [[Image:Foo.png|thumb|ISBN 213 ISBN 213 ISBN 213 ... ]] ISBN 123-213 ISBN 123-213 ISBN 123-213 ... """
*** Bug 2440 has been marked as a duplicate of this bug. ***
The problem occurs also when using external links starting with "mailto:" for mail addresses.
(In reply to comment #24) > Created an attachment (id=461) [edit] > fixed my ugly test patch Applied to HEAD.
Comment on attachment 462 [details] RAWR! Marking attachment 462 [details] as obsolete since it won't work for the reasons I stated in comment #26.
This problem also affects parser hooks, like <ref> from the Cite.php extension.
Currently looking into this- i've got a small regexp that fixes the ISBN and RFC/PMID case, but it doesn't help the math/plugin one. Another case of missing nesting state, and probably another case of adding a brittle hack.
I've fixed this for stripped things like math and extensions now, unstripping the $alt text in the parser before constructing the thumb does the trick. All html tags are removed right after the unstrip, leaving only the text. This is not committed yet, i'm collecting a bunch of parser fixes for review sometime next week.
In CVS now, closing as fixed.