Last modified: 2014-11-20 09:25:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39665, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37665 - IPTCTest::testIPTCParseForcedUTFButInvalid failure on PHP with buggy glibc (iconv //IGNORE broken)
IPTCTest::testIPTCParseForcedUTFButInvalid failure on PHP with buggy glibc (i...
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Unit tests (Other open bugs)
1.20.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 67908 73178 (view as bug list)
Depends on:
Blocks: 73175
  Show dependency treegraph
 
Reported: 2012-06-17 15:35 UTC by Platonides
Modified: 2014-11-20 09:25 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
iconv-test.c (754 bytes, text/plain)
2012-06-17 15:35 UTC, Platonides
Details

Description Platonides 2012-06-17 15:35:16 UTC
Created attachment 10763 [details]
iconv-test.c

Our test IPTCTest::testIPTCParseForcedUTFButInvalid verifies that when feeding image metadata marked as UTF-8 but with non-UTF-8 bytes, the bad bytes will be dropped and the sane UTF-8 kept.

This was the behavior of iconv() in php < 5.4 as can be tested with
 var_dump( iconv("UTF-8", "UTF-8//IGNORE", "\xC3\xC3\xC3\xB8") );

The behavior of iconv(3) (with IGNORE) is to provide the good bytes *and* report the error. That can be tested with the attached program.

The fact that when not using IGNORE, the were returned was reported as a bug in https://bugs.php.net/52211 and fixed in e3fdf3 by always returning an empty string.

So our parsing of IPTC data is now different (wrong?) on PHP 5.4

We can:
- Set the empty string as the correct output (remove/change the test)
- Verify UTF-8 correctness ourselves (using UtfNormal::cleanUp() seems the appropiate one, we could then remove utf-8 replacement char if a slient skip is really desired).
- Request php iconv() behavior to change back / add a new flag.
Comment 1 Bryan Davis 2014-11-09 05:20:05 UTC
*** Bug 67908 has been marked as a duplicate of this bug. ***
Comment 2 Bryan Davis 2014-11-09 05:49:47 UTC
It looks to me like the real problem is described in <https://bugs.php.net/bug.php?id=48147> and the upstream-upstream bug at <https://sourceware.org/bugzilla/show_bug.cgi?id=13541>. Apparently glibc's iconv implementation deviates from the documented API of libiconv. Unfortunately the fix that was suggested to PHP to work around the glibc bug has not been implemented.
Comment 3 Gerrit Notification Bot 2014-11-09 06:40:18 UTC
Change 172101 had a related patch set uploaded by BryanDavis:
Avoid glibc iconv bug by using mb_convert_encoding

https://gerrit.wikimedia.org/r/172101
Comment 4 Bryan Davis 2014-11-09 15:36:53 UTC
*** Bug 73178 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links