Last modified: 2013-11-06 14:50:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T12352, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 10352 - Use Unicode quotes, ellipses, dashes, hyphens instead of ASCII ones in messages
Use Unicode quotes, ellipses, dashes, hyphens instead of ASCII ones in messages
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.21.x
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: utf8
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-24 09:30 UTC by Hendrik Maryns
Modified: 2013-11-06 14:50 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
replaces ’, ... and " (51.87 KB, patch)
2007-06-24 09:30 UTC, Hendrik Maryns
Details

Description Hendrik Maryns 2007-06-24 09:30:46 UTC
Created attachment 3819 [details]
replaces ’, ... and "

I work on the Dutch translation of the wiki software.  In looking at MessagesEn.php, I noticed that the way to handle single quotes, if they need to be in the string is handled very inconsistently: sometimes they are quoted: '\'', sometimes the string is put into double quotes: "'".  This makes it confusing, and it is unelegant.  There is an easy solution though: use the unicode quote sign: ’.  This can be used anywhere, since it has no meaning for php: '’', "’".
I have replaced all relevant occurrences of ' in MessagesEn.php, see the patch.  (As a side effect, all ’ in comments are replaced too.)  Notice that sometimes, ‘ is the correct alternative: there where it is an opening quote.  See http://www.unicode.org/charts/PDF/U2000.pdf, entities 2018, 2018, 2026.

While we’re at it, I can as well suggest another improvement: use real ellipsis instead of three dots.  That is also in the patch.

If you’re worried about a11y: see the corresponding bug in Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=373623.

Oh, and by the way, even more exotic Unicode symbols are used already, ← for example (in: 'previousrevision'      => '←Older revision',).

Once this is done, one could think of making the use of ' and " more consistent (I’d say: always use the second, since ' is needed in wiki markup from time to time).  I replaced the " by ', there where they were only used to allow ' which are no longer there.

Of course, I will have overlooked some occurrences in the patch, but it is a first start.
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-24 22:15:01 UTC
There are usability issues with using hard-to-input characters in messages that are supposed to be easily customizable to meet end-users' needs (which makes this rather different from Firefox).  Besides, undifferentiated quotes are standard on the Internet especially, with curly quotes rare.  The state of MessagesEn.php is certainly not a factor worthy of consideration.  I would be inclined to resolve LATER, when input methods are superior or it's common to not use ASCII apostrophes/quotation marks.

(As for ellipses, in some common fonts literal ellipses look very ugly compared to three literal periods.  Possibly I'm just thinking of fixed-width fonts, but I don't think so.)
Comment 2 Hendrik Maryns 2007-06-28 18:09:25 UTC
Doesn’t that contradict the use of those arrows ‘←’?

‘ and ’ aren’t that hard to input: on a qwerty US-int it is simply Right Alt+9 and 0.  Ellipsis is another matter.

What does it matter that ‘undifferentiated quotes are standard on the Internet especially’?  Of course they are, since the curly quotes are rather new and almost nobody knows that they even exist.  But is that an argument against using them?  With small type the difference is barely visible, and certainly no-one will get confused when seeing them.

Ellipsis indeed looks ugly in fixed-width fonts, but that’s obviously *because* it is fixed width.  It looks rather good in my default FF fonts.

Damn, is it really enough for one guy to shoot this off?  Pity.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-28 18:31:13 UTC
(In reply to comment #2)
> Doesn’t that contradict the use of those arrows ‘←’?

No, because those only occur in a couple of places and aren't going to be necessary in routine message edits to ensure visual consistency.

> ‘ and ’ aren’t that hard to input: on a qwerty US-int it is simply Right
> Alt+9 and 0.

Nope.  It varies widely depending on operating system (and window manager, if applicable).  In GNOME, on Ubuntu, that doesn't work: I need Ctrl-Shift-u2018/2019.  On Windows you'd generally use Alt-145/146, numbers from the numpad only, with Num Lock on (or was it off?).  I recall Macs have something like what you describe, although I've never used Macs.

> What does it matter that ‘undifferentiated quotes are standard on the
> Internet especially’?  Of course they are, since the curly quotes are rather
> new and almost nobody knows that they even exist.  But is that an argument
> against using them?

It's an argument that they're unnecessary.  If there were no drawbacks, may as well use them even if they're unnecessary, since they look nice.  But as I outlined, there are drawbacks, even if you don't think they're substantial in the face of good typography.

> With small type the difference is barely visible, and
> certainly no-one will get confused when seeing them.

Even harder to keep consistency, then, and even less of an advantage to using them.

> Ellipsis indeed looks ugly in fixed-width fonts, but that’s obviously
> *because* it is fixed width.  It looks rather good in my default FF fonts.

Mine too.  Maybe I was imagining something.

> Damn, is it really enough for one guy to shoot this off?  Pity.

Two guys: I expressed my reservations, and lead developer Brion Vibber resolved the bug INVALID.  Web typography and monitor resolutions may eventually improve to the point that this would be nice, but not now.
Comment 4 Hendrik Maryns 2007-06-28 18:38:17 UTC
(In reply to comment #3)
> > Damn, is it really enough for one guy to shoot this off?  Pity.
> 
> Two guys: I expressed my reservations, and lead developer Brion Vibber resolved
> the bug INVALID.  Web typography and monitor resolutions may eventually improve
> to the point that this would be nice, but not now.

Sorry, didn’t mean to be rude, there.  Well, ok then.  I’ll hope it won’t take too long, I like things to be correct...  Cheers.
Comment 5 Diederik van Liere 2011-11-29 22:28:24 UTC
Is this something that we should reconsider?
Comment 6 MZMcBride 2011-11-29 22:37:56 UTC
(In reply to comment #5)
> Is this something that we should reconsider?

I don't think so. The bug's resolution could be changed to "wontfix" if "later" is bothering you.
Comment 7 Bartosz Dziewoński 2012-11-13 18:24:12 UTC
Reopening from LATER and adjusting summary.

(In reply to comment #3)
> Web typography and monitor resolutions may eventually improve
> to the point that this would be nice, but not now.

Five years passed, it’s 2012 now. I’m pretty sure this would now be acceptable from the accessibility point of view, and I don’t understand the concerns about inputting difficulties — yes, there are ones (not much changed in this aspect), but why would anyone need to input parts of the interface other than the people creating it?).

So, do we want to use correct typography in English language messages (and encourage using it in other languages’ ones)? I think that currently the general Unicode‐compatibility of browsers and OSes is good enough to do this.

(The original report also mentioned changes to comments in code, but I think this could be a bad idea — code should be very easily greppable, is still semi‐often displayed in Unicode‐crippled environments like Windows’ cmd.exe, and lacks explicit encoding information, unlike HTML pages.)

(I took special care to include several Unicode characters in this comment. Can you spot them all?)
Comment 8 Nemo 2013-11-06 14:50:12 UTC
I'm missing a rationale/use case here.

(In reply to comment #0)
> handled very inconsistently: sometimes they are quoted:
> '\'', sometimes the string is put into double quotes: "'".  This makes it
> confusing, and it is unelegant.  There is an easy solution though: use the
> unicode quote sign: ’.

There is an even easier solution: use "" around the string containing the message if there is a ' inside it, and '' if it contains "...
Dashes and hyphens have just been added to the summary and I've no idea why.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links