Last modified: 2012-05-11 06:08:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14365 - Entries in the image table with invalid titles
Entries in the image table with invalid titles
Status: RESOLVED DUPLICATE of bug 22939
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
: 16056 (view as bug list)
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2008-05-31 18:44 UTC by Adrian Lang
Modified: 2012-05-11 06:08 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Comment 1 Brion Vibber 2008-06-02 16:07:43 UTC
maintenance/cleanupImages.php appears to be broken; probably needs to be updated for recent DB load balancer changes. (via cleanupTable.inc via FiveUpgrade.inc)
Comment 2 Platonides 2008-10-25 11:07:15 UTC
*** Bug 16056 has been marked as a duplicate of this bug. ***
Comment 3 Platonides 2008-10-25 11:10:35 UTC
List of invalid titles at commons image table from bug 16056:
*Eaton's_Ninth_Floor_Restaurant.jpg
*Honest_Ed's.jpg
*Passing_the_time_at_Jeffrey's_Bay-_South_Africa.jpg
*Toronto's_Opera_House_-_In_construction.jpg
*Here's_looking_at_you_kid-raffi_torres.jpg
*Nice_Côte_d'Azur.jpg
*Ward's_ferry_line_-_1_12hr_later.jpg

Increasing priority: When trying working with those images 
(so far only with the api) a php error is produced:
PHP fatal error in
/usr/local/apache/common-local/php-1.5/includes/filerepo/RepoGroup.php line 94:
Call to a member function getDBkey() on a non-object 

Comment 4 Platonides 2008-10-26 15:26:23 UTC
There're also images with a wrong title with " entity:
*Picswiss_BE-90-14_Hotel_"Oeschinensee"_beim_Oeschinensee.jpg
*Picswiss_BE-90-15_Hotel_und_Berghaus_"Oeschinensee"_beim_Oeschinensee.jpg
*Picswiss_BE-94-01_Kirche_von_Würzbrunnen_(Röthenbach)_-_"Gotthelf-Kirche&quo.jpg
*Picswiss_BE-94-07_Kirche_von_Würzbrunnen_(Röthenbach)_-_"Gotthelf-Kirche&quot.jpg
*Picswiss_GR-84-06_Splügen-_Hinterrhein,_Hotel_"Suretta".jpg
*Picswiss_GR-84-14_Hotel_"Weiss_Kreuz"_in_Splügen.jpg
*Picswiss_GR-84-16_Splügen_mit_Teurihorn_(Talstation_der_"Tambo"-Bahnen).jpg
*Picswiss_GR-84-31_Splügen_mit_dem_Teurihorn_(Talstation_"Tambo"-Bahnen).jpg
*Picswiss_GR-84-32_Ruine_"Zur_Burg"_in_Splügen.jpg

Uploaded August 2007
Comment 5 Ilmari Karonen 2008-12-09 23:24:10 UTC
The fatal error should be fixed by r44370: now those titles are simply skipped.  Actually fixing them requires running maintenance/cleanupImages.php.  Reducing priority back to normal.
Comment 6 Ilmari Karonen 2009-01-04 02:29:37 UTC
The Picswiss files have been fixed now, but the ones with ' are still broken, apparently because Sanitizer::decodeCharReferences() doesn't recognize it.  I've committed a fix in r45387 -- it's a bit ugly due to the special status of ' as the only named character entity defined in XHTML 1.0 but not in HTML 4.01.
Comment 7 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-01-04 18:57:26 UTC
(In reply to comment #6)
> The Picswiss files have been fixed now, but the ones with ' are still
> broken, apparently because Sanitizer::decodeCharReferences() doesn't recognize
> it.  I've committed a fix in r45387 -- it's a bit ugly due to the special
> status of ' as the only named character entity defined in XHTML 1.0 but
> not in HTML 4.01.

Why do we care about HTML 4.01?
Comment 8 Brion Vibber 2009-01-07 02:33:11 UTC
Reverted in r45477 as the special casing seems totally unnecessary...
Comment 9 Ilmari Karonen 2009-01-07 04:35:44 UTC
If we add ' to $wgHtmlEntities, the Sanitizer will allow it through normalizeCharReferences().  This could cause inconsistent rendering on old browsers that predate XHTML, or apparently even on some rather modern versions of IE if the page is served with a doctype (or maybe MIME type?) indicating HTML4 rather than XHTML.  See e.g. http://cssvault.com/blog/2007/10/17/internet-explorer-apos-feature/

Still, we claim to be serving XHTML, so I suppose this should not be a problem (even if I do believe we're serving it with a "text/html" MIME type).  And even on old browsers that don't support it, all that's likely to happen is that it'll be rendered verbatim (as indeed the sanitizer currently forces it to be).
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-01-07 05:05:45 UTC
We always serve with an XHTML doctype, so that's not an issue.  What browsers won't recognize it?  Are we talking like NN4 here, or like IE6?
Comment 11 Ilmari Karonen 2009-01-07 06:03:13 UTC
I've tried searching for a browser compatibility table for ', but haven't found any so far.  Anyway, XHTML has existed for almost a decade now, so I suspect only very old browsers would be completely unaware of it.  The IE behavior worries me more: one page I found, http://seewhatever.de/blog/?p=114 , says even IE 7 won't recognize ' in HTML mode, and seems to suggest that it's the MIME type that makes the difference.

Anyway, why do we output named character entities at all?  We already have a table of the Unicode code points corresponding to all of them, so it would be trivial to make normalizeEntity() output numeric entities only.
Comment 12 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-01-07 06:24:49 UTC
Because that's uglier, maybe?  It should be simple enough to test ' in various browsers and see which work, anyway.
Comment 13 entlinkt 2010-08-13 10:03:55 UTC
' is not valid XHTML per se; it's only usable in XHTML because XHTML is supposed to be XML, and XHTML is XML only if served with an XML mime type. If served as text/html, it's just the version of HTML that the browser supports (i.e. still HTML4 in many cases) sugared with some invalid XMLisms that the browser may or may not support - see http://www.w3.org/TR/xhtml1/#C_16.

Mozilla, Opera and Webkit accept ' in HTML anyway (maybe because it is valid HTML5), but IE versions 6, 7 and 8 do not.
Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-08-13 17:28:50 UTC
Yeah, confirmed, IE doesn't accept &apos;, even IE8 with <!doctype html>.
Comment 15 Sam Reed (reedy) 2011-07-17 21:12:26 UTC
What's the state of this then?

cleanupImages in it's current incarnation finds no issues...
Comment 16 Ilmari Karonen 2011-07-17 21:29:59 UTC
The images with &apos; in the title (see comment 3) appear to still exist in the image table, so I guess there's still an issue.  r45387 would've fixed it, but Brion reverted it and apparently nobody ever got around to either unreverting it or committing the alternative fix Brion suggested.
Comment 17 Sam Reed (reedy) 2011-10-15 21:45:44 UTC
(In reply to comment #16)
> The images with &apos; in the title (see comment 3) appear to still exist in
> the image table, so I guess there's still an issue.  r45387 would've fixed it,
> but Brion reverted it and apparently nobody ever got around to either
> unreverting it or committing the alternative fix Brion suggested.

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Sanitizer.php?view=annotate#l75

r89681

So brion did fix it...

So it's in 1.18
Comment 18 Mark A. Hershberger 2011-10-16 15:23:01 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > The images with &apos; in the title (see comment 3) appear to still exist in
> > the image table, so I guess there's still an issue.  r45387 would've fixed it,
> > but Brion reverted it and apparently nobody ever got around to either
> > unreverting it or committing the alternative fix Brion suggested.
> 
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Sanitizer.php?view=annotate#l75
> 
> r89681
> 
> So brion did fix it...
> 
> So it's in 1.18

Does cleanupImages need to be run anywhere?  Is there anything else that needs to happen?
Comment 19 Sam Reed (reedy) 2012-05-10 03:34:15 UTC

*** This bug has been marked as a duplicate of bug 22939 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links