Last modified: 2010-05-15 15:33:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3639, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 1639 - PostgreSQL database encoding causes problems


Summary:	PostgreSQL database encoding causes problems

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Installer (Other open bugs)
Version:	1.4.x
Hardware:	All Linux

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	postgres 385
	Show dependency tree / graph

Reported:	2005-03-06 22:05 UTC by Damon Buckwalter
Modified:	2010-05-15 15:33 UTC (History)
CC List:	0 users

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Damon Buckwalter 2005-03-06 22:05:38 UTC

When installing under PostgreSQL (using directions from
http://ph3.defau.lt/index.php/PostgreSQL_Install), if the database encoding is
set to 'UNICODE', some problems are encountered inserting items into the
"objectcache" table.  I believe the problem stems from using the 'text' column
type in PostgreSQL, versus a MEDIUMBLOB in MySQL.  Perhaps 'bytea' should be
used instead?  This does require some special formatting of the input, but is a
more analgous type to BLOBs.

Setting the database encoding to LATIN1 also avoids the problem (the 'text' type
then no longer looks for valid UTF-8 strings).

Comment 1 Brian Herlihy 2005-06-08 06:34:14 UTC

If the data is not UTF-8 encoded, then using LATIN1 client encoding will make
everything run smoothly.  But the data stored in the database will in fact be
the unicode translations of those characters you are storing.  This means that
the higher bytes are being stored as 2 or 3 bytes in unicode.  This wastes space
and requires the data to be decoded when fetched and encoded when stored.

You can save yourself a lot of headache by making the database use latin1.

Comment 2 Greg Sabino Mullane 2006-07-17 01:57:11 UTC

This field is currently bytea, (and it does cost us some contortions), so I am
closing the bug for now.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links