Last modified: 2012-05-05 23:59:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37626, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35626 - Extended characters show up as "?" in Gerrit user names
Extended characters show up as "?" in Gerrit user names
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Git/Gerrit (Other open bugs)
unspecified
All All
: High critical (vote)
: ---
Assigned To: Chad H.
https://gerrit.wikimedia.org/r/4040
: patch, patch-reviewed
: 35455 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-31 15:15 UTC by Beau
Modified: 2012-05-05 23:59 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Tell gerrit to use UTF-8 with MySQL (1.11 KB, patch)
2012-04-12 17:44 UTC, Marcin Cieślak
Details

Description Beau 2012-03-31 15:15:32 UTC
There is an account 'User:Szymon Świerkosz' on labsconsole wiki, however gerrit shows it as 'Szymon ?wierkosz'. I have provided the URL for an example page.
Comment 1 Mark A. Hershberger 2012-04-02 17:15:53 UTC
Adjusting bug summary... I assume this is upstream, but don't really know for sure.
Comment 2 Niklas Laxström 2012-04-02 17:17:29 UTC
Probably dupe of the other gerrit unicode bug.
Comment 3 Rob Lanphier 2012-04-02 17:23:51 UTC
This is very likely an upstream problem, but it seems to be specific to user names.  For example, in https://gerrit.wikimedia.org/r/4040 , Szymon's name is shown correctly in the "committer" field, but incorrectly in the "reviewer" and "owner" fields.
Comment 4 MZMcBride 2012-04-02 18:59:12 UTC
(In reply to comment #3)
> This is very likely an upstream problem, but it seems to be specific to user
> names.

What about this issue suggests it's an upstream problem?
Comment 5 Chad H. 2012-04-02 20:17:18 UTC
Well pretty much everything with gerrit is an upstream problem ;-)

Like the other unicode bugs, we can probably work around this though.
Comment 6 Marcin Cieślak 2012-04-11 21:20:16 UTC
Here's an interesting one:

http://code.google.com/p/gerrit/issues/detail?id=1082

They say UTF-8 won't work with MySQL :/
Comment 7 Niklas Laxström 2012-04-12 09:39:28 UTC
MediaWiki works absolutely fine with MySQL and Unicode.

The correct phasing would be Gerrit does not support Unicode when using MySQL as backend.
Comment 8 Mark A. Hershberger 2012-04-12 14:07:41 UTC
(In reply to comment #7)
> MediaWiki works absolutely fine with MySQL and Unicode.
> 
> The correct phasing would be Gerrit does not support Unicode when using MySQL
> as backend.

Was about to say, this definitely sounds like a Gerrit problem.
Comment 9 Chad H. 2012-04-12 14:23:43 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > MediaWiki works absolutely fine with MySQL and Unicode.
> > 
> > The correct phasing would be Gerrit does not support Unicode when using MySQL
> > as backend.
> 
> Was about to say, this definitely sounds like a Gerrit problem.

As I said upstream, Gerrit claiming this doesn't work is just silly. I've already theorized that we can just change the collations and this will work, but I haven't tested yet.

If someone wants to test this theory, we can set you up with access to the gerrit project on labs (which is already running 2.3).
Comment 10 Marcin Cieślak 2012-04-12 15:54:04 UTC
Nope - tested with 2.3-rc0-158-g34ab429 - I have utf8_unicode_ci on all MySQL tables and I get question marks. 

A bit newer Gerrit deployed on PostgreSQL is fine.
Comment 11 Chad H. 2012-04-12 16:24:26 UTC
(In reply to comment #10)
> Nope - tested with 2.3-rc0-158-g34ab429 - I have utf8_unicode_ci on all MySQL
> tables and I get question marks. 
> 

We've got 2.3 final on gerrit-dev on labs so we can test there. Want me to add you? I'm wondering if making the fields binary like we do in MediaWiki would work...but that's a bigger change than just the collations on the tables.

> A bit newer Gerrit deployed on PostgreSQL is fine.

I really don't see us moving to PG or H2, so we need to find a fix. I *refuse* to believe Gerrit that this is unfixable on MySQL.
Comment 12 Marcin Cieślak 2012-04-12 17:44:52 UTC
Created attachment 10411 [details]
Tell gerrit to use UTF-8 with MySQL

My MySQL database is in UTF-8 and it sees that gerrit stores the values properly.

A patch attached forces gerrit to use UTF-8 when connecting to MySQL.
Comment 13 Marcin Cieślak 2012-04-12 18:37:51 UTC
^demon, can you try this change in the configuration (assuming we can have tables in UTF-8):

[database]
        type = JDBC
        driver = com.mysql.jdbc.Driver
        url = jdbc:mysql://localhost/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci
        username = gerrit2

"database" and "hostname" entries should be removed. "username" should stay.
Comment 14 Chad H. 2012-04-12 19:30:47 UTC
*** Bug 35455 has been marked as a duplicate of this bug. ***
Comment 15 Niklas Laxström 2012-04-24 08:50:49 UTC
I don't think that a dataloss bug should be Low/Normal.
Comment 16 Chad H. 2012-04-27 20:04:37 UTC
The following tables are definitely affected and need some sort of fix:
  account_external_ids
  accounts
  changes
  patch_comments

These tables aren't currently affected, but could be if we put non-ASCII data into them.
  account_group_names
  account_groups
  approval_categories
  approval_category_values
  change_messages
  tracking_ids
Comment 17 Chad H. 2012-05-02 21:04:46 UTC
Ok, collation has been updated on all tables, and https://gerrit.wikimedia.org/r/#change,6439 has been submitted to change the connection url.
Comment 18 Chad H. 2012-05-02 22:52:05 UTC
(In reply to comment #13)
> ^demon, can you try this change in the configuration (assuming we can have
> tables in UTF-8):
> 
> [database]
>         type = JDBC
>         driver = com.mysql.jdbc.Driver
>         url =
> jdbc:mysql://localhost/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci
>         username = Gerrit change #2
> 
> "database" and "hostname" entries should be removed. "username" should stay.

Ok, I changed the collation/charset on all the tables, and we updated the connection string. The database is now showing the correct data (yay!), but we're still not getting the right data to the UI.

See the owner on https://gerrit.wikimedia.org/r/#change,6388 which is an improvement although still not correct.
Comment 19 Marcin Cieślak 2012-05-03 06:06:41 UTC
Looks "better" now.

I would then try connecting to MySQL via JDBC directly and see if it's okay.
You can try https://gerrit-review.googlesource.com/#/c/34670/ to play live with data obtained from the SQL database via Gerrit's ORM or play directly.

You can try this code:

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60187/focus=60206

to check what really JDBC sees. 

I hope you didn't end up with a double-encoded UTF-8 in the database (quite easy to do with MySQL, harder to recover) - so that Ś is not 0xC5 0x9A but 0xC3 0x85 0xC2 0x9A instead.
Comment 20 Marcin Cieślak 2012-05-03 06:15:05 UTC
Some data from my MySQL instance:


$ mysql -u root -p reviewdb --default-character-set=utf8
Enter password: 
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1212
Server version: 5.0.92 FreeBSD port: mysql-server-5.0.92

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s
--------------
mysql  Ver 14.12 Distrib 5.0.92, for portbld-freebsd8.2 (amd64) using  5.2

Connection id:		1212
Current database:	reviewdb
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.0.92 FreeBSD port: mysql-server-5.0.92
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/tmp/mysql.sock
Uptime:			21 days 8 hours 24 min 17 sec

Threads: 6  Questions: 339557  Slow queries: 0  Opens: 85  Flush tables: 1  Open tables: 64  Queries per second avg: 0.184
--------------

mysql> show full columns from accounts;
+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+
| Field                                  | Type         | Collation       | Null | Key | Default           | Extra | Privileges                      | Comment |
+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+
| registered_on                          | timestamp    | NULL            | NO   |     | CURRENT_TIMESTAMP |       | select,insert,update,references |         | 
| full_name                              | varchar(255) | utf8_bin        | YES  | MUL | NULL              |       | select,insert,update,references |         | 
| preferred_email                        | varchar(255) | utf8_bin        | YES  | MUL | NULL              |       | select,insert,update,references |         | 
| contact_filed_on                       | timestamp    | NULL            | YES  |     | NULL              |       | select,insert,update,references |         | 
| maximum_page_size                      | smallint(6)  | NULL            | NO   |     | 0                 |       | select,insert,update,references |         | 
| show_site_header                       | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| use_flash_clipboard                    | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| download_url                           | varchar(20)  | utf8_bin        | YES  |     | NULL              |       | select,insert,update,references |         | 
| download_command                       | varchar(20)  | utf8_bin        | YES  |     | NULL              |       | select,insert,update,references |         | 
| copy_self_on_email                     | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| date_format                            | varchar(10)  | utf8_bin        | YES  |     | NULL              |       | select,insert,update,references |         | 
| time_format                            | varchar(10)  | utf8_bin        | YES  |     | NULL              |       | select,insert,update,references |         | 
| display_patch_sets_in_reverse_order    | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| display_person_name_in_review_category | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| inactive                               | char(1)      | utf8_unicode_ci | NO   |     | N                 |       | select,insert,update,references |         | 
| account_id                             | int(11)      | NULL            | NO   | PRI | 0                 |       | select,insert,update,references |         | 
+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+
16 rows in set (0.02 sec)

mysql> select full_name from accounts where preferred_email like 'saper%' \G
*************************** 1. row ***************************
full_name: Marcin Cieślak
1 row in set (0.00 sec)
Comment 21 Marcin Cieślak 2012-05-03 08:09:46 UTC
Additionally, here's the output of my sane MySQL Gerrit instance via the Gerrit Inspector feature (patching your gerrit with https://gerrit-review.googlesource.com/#/c/34670/ should be mostly harmless :):

(lost of startup messages on Gerrit console)
"jettyserver" is "com.google.gerrit.pgm.http.jetty.JettyServer@1fdac8a5"
"db" is "com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$25@1c8aeedc"

Welcome to the Gerrit Inspector
Enter help() to see the above again, EOF to quit and stop Gerrit
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) 
[OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 running for Gerrit 2.4-rc0-78-g8ed6c15
>>> for z in db.accounts().iterateAllEntities():
...     print z.fullName
... 
Marcin Cieślak
Marcin Cieslak (via gmail)
>>>
Comment 22 Beau 2012-05-03 08:39:50 UTC
Um... I am unable to log in to gerrit right now.

Application Error
Server Error
Cannot assign user name
Comment 23 Chad H. 2012-05-03 17:40:17 UTC
Ok, everything should be squared away now. Usernames are now showing up properly[0], cover comments[1] and inline comments[2]. We also tested IRC--which works. E-mail notifs are working.

Only thing left to test is new user creation and login. Then we can mark this fixed.

[0] https://gerrit.wikimedia.org/r/#change,6008
[1] https://gerrit.wikimedia.org/r/#change,3962 (last comment)
[2] https://gerrit.wikimedia.org/r/#patch,sidebyside,3962,4,RELEASE-NOTES-1.20
Comment 24 Beau 2012-05-03 17:45:18 UTC
I can confirm logging in - works.
Comment 25 Sumana Harihareswara 2012-05-03 21:10:06 UTC
I've now created a user account via https://labsconsole.wikimedia.org/wiki/Special:CreateAccount for Paweł Sadowski and am waiting for Paweł to confirm that login for Labs & Gerrit works.
Comment 26 Chad H. 2012-05-03 23:44:06 UTC
I went ahead and made myself a testing account so I can use it in the future. It worked

https://gerrit.wikimedia.org/r/#dashboard,240

Marking this FIXED.
Comment 27 Marcin Cieślak 2012-05-05 20:37:25 UTC
As of now, the IRC bot says:


Lastlog:
04:42 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/extensions/ProofreadPage] (master) C: 1;  - https://gerrit.wikimedia.org/r/6345
04:49 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/extensions/ProofreadPage] (master) C: 1;  - https://gerrit.wikimedia.org/r/6340
13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for horizontal layout to a preference." [mediawiki/extensions/ProofreadPage] 
                   (master) - https://gerrit.wikimedia.org/r/6388
13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the proofreadpage_default_layout='horizontal' option doesn't work because of a change in the 
                   html generated by wikieditor." [mediawiki/extensions/ProofreadPage] (master) - https://gerrit.wikimedia.org/r/6003
13:41 < gerrit-wm> New review: Szymon ?wierkosz; "Nothing changed between Patch Set 1 and Patch Set 2. It is one of my another failed attempts at usin..." 
                   [mediawiki/extensions/ProofreadPage] (master) C: 1;  - https://gerrit.wikimedia.org/r/6003
20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for horizontal layout to a preference." [mediawiki/extensions/ProofreadPage] 
                   (master) - https://gerrit.wikimedia.org/r/6388
20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the proofreadpage_default_layout='horizontal' option doesn't work because of a change in the 
                   html generated by wikieditor." [mediawiki/extensions/ProofreadPage] (master) - https://gerrit.wikimedia.org/r/6003
13:08 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/core] (master) C: 0;  - https://gerrit.wikimedia.org/r/6596


Fortunately, the HTML output seems fine - but something might have changed (is it because of 2.3)?

Can you have a look at 2.3 database again? Maybe it's just some interface to the IRC bot?
Comment 28 Marcin Cieślak 2012-05-05 20:45:37 UTC
Did a simple test:

Added UTF-8 comment to:

https://gerrit.wikimedia.org/r/#/c/3289/

results:


$ ssh wikimedia gerrit stream-events
{"type":"comment-added","change":{"project":"test/mediawiki/core","branch":"master","topic":"master","id":"Icdc8f7e26c4cba920eda69a042702b8358797554","number":"3289","subject":"Testing git review...","owner":{"name":"IAlex","email":"ialex.wiki@gmail.com"},"url":"https://gerrit.wikimedia.org/r/3289"},"patchSet":{"number":"1","revision":"e5e3aafbce66df1b0a1094be7aa62c34a617c181","ref":"refs/changes/89/3289/1","uploader":{"name":"IAlex","email":"ialex.wiki@gmail.com"},"createdOn":1332230770},"author":{"name":"saper","email":"saper@saper.info"},"comment":"ąćęłńóśźć comment utf-8"}

But:

20:44 < gerrit-wm> New review: saper; "????????? comment utf-8" [test/mediawiki/core] (master) - https://gerrit.wikimedia.org/r/3289
Comment 29 Chad H. 2012-05-05 23:59:47 UTC
(In reply to comment #27)
> As of now, the IRC bot says:
> 
> 
> Lastlog:
> 04:42 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
> [mediawiki/extensions/ProofreadPage] (master) C: 1;  -
> https://gerrit.wikimedia.org/r/6345
> 04:49 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
> [mediawiki/extensions/ProofreadPage] (master) C: 1;  -
> https://gerrit.wikimedia.org/r/6340
> 13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for
> horizontal layout to a preference." [mediawiki/extensions/ProofreadPage] 
>                    (master) - https://gerrit.wikimedia.org/r/6388
> 13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the
> proofreadpage_default_layout='horizontal' option doesn't work because of a
> change in the 
>                    html generated by wikieditor."
> [mediawiki/extensions/ProofreadPage] (master) -
> https://gerrit.wikimedia.org/r/6003
> 13:41 < gerrit-wm> New review: Szymon ?wierkosz; "Nothing changed between Patch
> Set 1 and Patch Set 2. It is one of my another failed attempts at usin..." 
>                    [mediawiki/extensions/ProofreadPage] (master) C: 1;  -
> https://gerrit.wikimedia.org/r/6003
> 20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for
> horizontal layout to a preference." [mediawiki/extensions/ProofreadPage] 
>                    (master) - https://gerrit.wikimedia.org/r/6388
> 20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the
> proofreadpage_default_layout='horizontal' option doesn't work because of a
> change in the 
>                    html generated by wikieditor."
> [mediawiki/extensions/ProofreadPage] (master) -
> https://gerrit.wikimedia.org/r/6003
> 13:08 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
> [mediawiki/core] (master) C: 0;  - https://gerrit.wikimedia.org/r/6596
> 
> 
> Fortunately, the HTML output seems fine - but something might have changed (is
> it because of 2.3)?
> 
> Can you have a look at 2.3 database again? Maybe it's just some interface to
> the IRC bot?

Could this be bug 36487?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links