Last modified: 2010-05-15 15:33:06 UTC
This was originally entered as bug 1033 comment 2, but I think it is actually a seperate issue, with a different effect (an image with an image in its caption is accepted as such by the parser, but the HTML produced is mangled). It's hard to work out exactly what's being produced because of things like 'tidy' going round closing tags, but it looks like there's something wrong with the order in which things are being escaped and/or removed from 'alt' and 'title' attributes, so that the HTML image markup for the inner image is being inserted into those attributes. The lines in Skin::make{Image|Thumb}LinkObj() beginning with $alt = preg_replace( '/<[^>]*>/', '', $alt ); should cover this, leading me to suspect that the actual link-parsing is happening in the wrong order (e.g. "...[[Image:...]]..." being replaced with HTML when it's already in the attributes) , but I can't see anything wrong with Parser::replaceInternalLinks() at a glance. I'm CCing Wil Mahan on this, because we hacked the relevant bits of replaceInternalLinks() around a lot between us (see bug 637), so he might have some thoughts.
This is a serious bug, introduced recently. Just about all of the "Flag of" articles in Wikipedia are broken as a result, because they systematically include a small symbol image in their captions. See for example [[Flag of Costa Rica]] or [[Flag of Canada]]. Well-meaning editors have started "fixing" those pages by removing the image within the caption, which is a loss of information. Please get this fixed ASAP!
Created attachment 203 [details] fix: use "http-noparse" rather than a full-on insertStripItem() I got round to tracking this one down at last: the entire image code was being stripped out and replaced with a placeholder; when one image was inside another, the placeholder was surviving in the alt and title attributes, and then being replaced with the full code. Hence the attributes ended up "containing" whole stanzas of HTML, as I suspected. As far as I can see, the only thing that needed hiding was full URLs, which replaceExternalLinks() mustn't touch. I've therefore borrowed somebody else's hack (it's used in Skin::makeMediaLinkObj()) of replacing "http://" with "http-noparse://" - a kind of very specific place-holder, which gets swapped back again once replaceExternalLinks() has finished. I ran parserTests.php, and was annoyed to find that it failed the test because the test assumed an image existed which in reality didn't. I'm 99% sure the code produced is actually correct. But in doing so, I realised that I'd also inadvertently solved bug 1219, which I'd never heard of before. Which is nice.
Fixed in CVS HEAD and REL1_4. Deployed to wikipedia.org