Last modified: 2008-08-19 16:50:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 11087 - Image truncated comment chopped UTF-8 character
Image truncated comment chopped UTF-8 character
Status: RESOLVED DUPLICATE of bug 332
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.11.x
All All
: Low trivial (vote)
: ---
Assigned To: Nobody - You can work on this!
http://commons.wikimedia.org/wiki/Ima...
:
Depends on:
Blocks: unicode
  Show dependency treegraph
 
Reported: 2007-08-27 22:48 UTC by Dan Jacobson
Modified: 2008-08-19 16:50 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dan Jacobson 2007-08-27 22:48:51 UTC
Observe in the "comment" box on the above URL,
how a UTF-8 3 byte long character has been truncated.

Please consider characters, not bytes, in determining where to truncate.

Else you produce invalid characters: not ASCII, not UTF-8, which show up with the invalid character symbol in browsers.
Comment 1 Dan Jacobson 2007-11-13 00:51:54 UTC
Also please be sure you don't truncate UTF-8 when including snippets which I often see in e.g., my
http://commons.wikimedia.org/wiki/Special:Contributions/Jidanni !
Comment 2 Nicolas Dumazet 2008-03-13 23:36:24 UTC
img_description tinyblob NOT NULL

This field is binary and does not store any encoding along with the data. That's why when it comes to truncating the string to fit in the field (255 bytes), SQL do not check if it breaks encoding do so. In my opinion, using a varchar(255), which actually stores in the field the encoding, would solve that problem.
Comment 3 Brion Vibber 2008-03-14 17:17:50 UTC
That's incorrect; VARCHAR would have the same issue.

Note that we do not use MySQL's utterly broken UTF-8 support as it does not actually support UTF-8, but only a limited subset of UTF-8. As a result, we use binary fields for data safety.

As with the related bugs, the correct fix is to apply UTF-8-safe truncation on input data that's destined for short fields.
Comment 4 Marcin Cieślak 2008-08-16 23:53:45 UTC
Isn't this duplicate of bug 332?
Comment 5 Dan Jacobson 2008-08-19 16:50:53 UTC
I guess so.

*** This bug has been marked as a duplicate of bug 332 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links