Last modified: 2012-05-11 06:08:41 UTC
As you may see at http://commons.wikimedia.org/wiki/Image:Honest_Eds_at_Bathurst_and_Bloor.jpg (duplicate files), there has to be some file revision of Image:Honest Ed's.jpg left. please delete it and check whether there is some mediawiki bug involved in this situation. Useful links to the file: * http://commons.wikimedia.org/w/index.php?title=Special:Log&offset=20071026171559&user=Flickr+upload+bot&limit=1 * http://commons.wikimedia.org/w/index.php?title=Special:DeletedContributions&offset=20071026171559&target=Flickr+upload+bot&limit=2 * http://commons.wikimedia.org/w/index.php?title=Special:Undelete&target=Image%3ABroken%2FHonest_Ed\x26apos\x3bs\x2ejpg
maintenance/cleanupImages.php appears to be broken; probably needs to be updated for recent DB load balancer changes. (via cleanupTable.inc via FiveUpgrade.inc)
*** Bug 16056 has been marked as a duplicate of this bug. ***
List of invalid titles at commons image table from bug 16056: *Eaton's_Ninth_Floor_Restaurant.jpg *Honest_Ed's.jpg *Passing_the_time_at_Jeffrey's_Bay-_South_Africa.jpg *Toronto's_Opera_House_-_In_construction.jpg *Here's_looking_at_you_kid-raffi_torres.jpg *Nice_Côte_d'Azur.jpg *Ward's_ferry_line_-_1_12hr_later.jpg Increasing priority: When trying working with those images (so far only with the api) a php error is produced: PHP fatal error in /usr/local/apache/common-local/php-1.5/includes/filerepo/RepoGroup.php line 94: Call to a member function getDBkey() on a non-object
There're also images with a wrong title with " entity: *Picswiss_BE-90-14_Hotel_"Oeschinensee"_beim_Oeschinensee.jpg *Picswiss_BE-90-15_Hotel_und_Berghaus_"Oeschinensee"_beim_Oeschinensee.jpg *Picswiss_BE-94-01_Kirche_von_Würzbrunnen_(Röthenbach)_-_"Gotthelf-Kirche&quo.jpg *Picswiss_BE-94-07_Kirche_von_Würzbrunnen_(Röthenbach)_-_"Gotthelf-Kirche".jpg *Picswiss_GR-84-06_Splügen-_Hinterrhein,_Hotel_"Suretta".jpg *Picswiss_GR-84-14_Hotel_"Weiss_Kreuz"_in_Splügen.jpg *Picswiss_GR-84-16_Splügen_mit_Teurihorn_(Talstation_der_"Tambo"-Bahnen).jpg *Picswiss_GR-84-31_Splügen_mit_dem_Teurihorn_(Talstation_"Tambo"-Bahnen).jpg *Picswiss_GR-84-32_Ruine_"Zur_Burg"_in_Splügen.jpg Uploaded August 2007
The fatal error should be fixed by r44370: now those titles are simply skipped. Actually fixing them requires running maintenance/cleanupImages.php. Reducing priority back to normal.
The Picswiss files have been fixed now, but the ones with ' are still broken, apparently because Sanitizer::decodeCharReferences() doesn't recognize it. I've committed a fix in r45387 -- it's a bit ugly due to the special status of ' as the only named character entity defined in XHTML 1.0 but not in HTML 4.01.
(In reply to comment #6) > The Picswiss files have been fixed now, but the ones with ' are still > broken, apparently because Sanitizer::decodeCharReferences() doesn't recognize > it. I've committed a fix in r45387 -- it's a bit ugly due to the special > status of ' as the only named character entity defined in XHTML 1.0 but > not in HTML 4.01. Why do we care about HTML 4.01?
Reverted in r45477 as the special casing seems totally unnecessary...
If we add ' to $wgHtmlEntities, the Sanitizer will allow it through normalizeCharReferences(). This could cause inconsistent rendering on old browsers that predate XHTML, or apparently even on some rather modern versions of IE if the page is served with a doctype (or maybe MIME type?) indicating HTML4 rather than XHTML. See e.g. http://cssvault.com/blog/2007/10/17/internet-explorer-apos-feature/ Still, we claim to be serving XHTML, so I suppose this should not be a problem (even if I do believe we're serving it with a "text/html" MIME type). And even on old browsers that don't support it, all that's likely to happen is that it'll be rendered verbatim (as indeed the sanitizer currently forces it to be).
We always serve with an XHTML doctype, so that's not an issue. What browsers won't recognize it? Are we talking like NN4 here, or like IE6?
I've tried searching for a browser compatibility table for ', but haven't found any so far. Anyway, XHTML has existed for almost a decade now, so I suspect only very old browsers would be completely unaware of it. The IE behavior worries me more: one page I found, http://seewhatever.de/blog/?p=114 , says even IE 7 won't recognize ' in HTML mode, and seems to suggest that it's the MIME type that makes the difference. Anyway, why do we output named character entities at all? We already have a table of the Unicode code points corresponding to all of them, so it would be trivial to make normalizeEntity() output numeric entities only.
Because that's uglier, maybe? It should be simple enough to test ' in various browsers and see which work, anyway.
' is not valid XHTML per se; it's only usable in XHTML because XHTML is supposed to be XML, and XHTML is XML only if served with an XML mime type. If served as text/html, it's just the version of HTML that the browser supports (i.e. still HTML4 in many cases) sugared with some invalid XMLisms that the browser may or may not support - see http://www.w3.org/TR/xhtml1/#C_16. Mozilla, Opera and Webkit accept ' in HTML anyway (maybe because it is valid HTML5), but IE versions 6, 7 and 8 do not.
Yeah, confirmed, IE doesn't accept ', even IE8 with <!doctype html>.
What's the state of this then? cleanupImages in it's current incarnation finds no issues...
The images with ' in the title (see comment 3) appear to still exist in the image table, so I guess there's still an issue. r45387 would've fixed it, but Brion reverted it and apparently nobody ever got around to either unreverting it or committing the alternative fix Brion suggested.
(In reply to comment #16) > The images with ' in the title (see comment 3) appear to still exist in > the image table, so I guess there's still an issue. r45387 would've fixed it, > but Brion reverted it and apparently nobody ever got around to either > unreverting it or committing the alternative fix Brion suggested. http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Sanitizer.php?view=annotate#l75 r89681 So brion did fix it... So it's in 1.18
(In reply to comment #17) > (In reply to comment #16) > > The images with ' in the title (see comment 3) appear to still exist in > > the image table, so I guess there's still an issue. r45387 would've fixed it, > > but Brion reverted it and apparently nobody ever got around to either > > unreverting it or committing the alternative fix Brion suggested. > > http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Sanitizer.php?view=annotate#l75 > > r89681 > > So brion did fix it... > > So it's in 1.18 Does cleanupImages need to be run anywhere? Is there anything else that needs to happen?
*** This bug has been marked as a duplicate of bug 22939 ***