Last modified: 2010-05-15 15:33:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1639 - PostgreSQL database encoding causes problems
PostgreSQL database encoding causes problems
Product: MediaWiki
Classification: Unclassified
Installer (Other open bugs)
All Linux
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: postgres 385
  Show dependency treegraph
Reported: 2005-03-06 22:05 UTC by Damon Buckwalter
Modified: 2010-05-15 15:33 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Damon Buckwalter 2005-03-06 22:05:38 UTC
When installing under PostgreSQL (using directions from, if the database encoding is
set to 'UNICODE', some problems are encountered inserting items into the
"objectcache" table.  I believe the problem stems from using the 'text' column
type in PostgreSQL, versus a MEDIUMBLOB in MySQL.  Perhaps 'bytea' should be
used instead?  This does require some special formatting of the input, but is a
more analgous type to BLOBs.

Setting the database encoding to LATIN1 also avoids the problem (the 'text' type
then no longer looks for valid UTF-8 strings).
Comment 1 Brian Herlihy 2005-06-08 06:34:14 UTC
If the data is not UTF-8 encoded, then using LATIN1 client encoding will make
everything run smoothly.  But the data stored in the database will in fact be
the unicode translations of those characters you are storing.  This means that
the higher bytes are being stored as 2 or 3 bytes in unicode.  This wastes space
and requires the data to be decoded when fetched and encoded when stored.

You can save yourself a lot of headache by making the database use latin1.
Comment 2 Greg Sabino Mullane 2006-07-17 01:57:11 UTC
This field is currently bytea, (and it does cost us some contortions), so I am
closing the bug for now.

Note You need to log in before you can comment on or make changes to this bug.