Last modified: 2005-07-23 06:41:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1972 - Serve files as UTF-8
Serve files as UTF-8
Product: Wikimedia
Classification: Unclassified
Bugzilla (Other open bugs)
All All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2005-04-25 10:16 UTC by Ævar Arnfjörð Bjarmason
Modified: 2005-07-23 06:41 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Ævar Arnfjörð Bjarmason 2005-04-25 10:16:54 UTC
It's annoying to have to manually switch settings when viewing attachments.
Comment 1 Ævar Arnfjörð Bjarmason 2005-04-27 21:44:46 UTC
Actually, they aren't being served with any specific character set, changing the
summary to reflect this.

$ printf "GET /attachment.cgi?id=455&action=view HTTP/1.0\nHost:\n\n"|nc 80|head
HTTP/1.1 200 OK
Date: Wed, 27 Apr 2005 21:42:38 GMT
Server: Apache/1.3.29 (Unix) PHP/4.3.11
Content-disposition: inline; filename="LanguageCs_1.5.php"
Content-length: 104226
Connection: close
Content-Type: text/plain; name="LanguageCs_1.5.php"

/** Czech (česky)

Regardless, it would be good to explicitly serve them as UTF-8.
Comment 2 River Tarnell 2005-04-28 02:37:48 UTC
i'm not clear what you want to do here.

do you want to set charset=UTF-8 for every (text) attachment served?

or, do you want to auto-convert text files to UTF-8 on upload, and then set

if the latter, this should probably be reported as a BugZilla enhancement request.
Comment 3 Brion Vibber 2005-04-28 02:43:57 UTC
The uploaded patches are already in UTF-8; they're just not being sent with a charset in the Content-type header.

Bug 609 describes the equivalent issue with bugmail.
Comment 4 River Tarnell 2005-04-28 02:45:01 UTC
yes, but you can't assume all files will be UTF-8, so you either send the wrong
encoding with some files, or you need to convert them as needed, or somehow
otherwise detect the encoding to send.
Comment 5 Ævar Arnfjörð Bjarmason 2005-04-28 02:57:44 UTC
(In reply to comment #2 and comment #4)

I want to set charset=utf-8 for every text attachment served.

Practically speaking the only attachments we get with characters that are not in
ASCII are patches for Language files, and since we'll be going all-UTF-8 in 1.5
these are going to be in UTF-8. There's really no need to make some 100% correct
character set detection system (and AFAIK such a thing isn't even possible),
serving them all as UTF-8 is good enough for our purposes.
Comment 6 Zigger 2005-07-23 06:41:24 UTC
Resolving as FIXED sometime past.  Current content-type response header for the
example is:

Content-Type: text/plain; name="LanguageCs_1.5.php"; charset=UTF-8

Note You need to log in before you can comment on or make changes to this bug.