Last modified: 2011-03-13 18:06:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3513, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1513 - Convert ASCII quotes to Unicode directional quotes, ellipses
Convert ASCII quotes to Unicode directional quotes, ellipses
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Lowest enhancement with 1 vote (vote)
: ---
Assigned To: Nathan Hamblen
: patch
Depends on:
Blocks: unicode
  Show dependency treegraph
Reported: 2005-02-11 14:21 UTC by Nathan Hamblen
Modified: 2011-03-13 18:06 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

New parser function to convert various strings to UTF-8 entities (2.41 KB, patch)
2005-02-11 14:32 UTC, Nathan Hamblen

Description Nathan Hamblen 2005-02-11 14:21:07 UTC
Although differentiated opening and closing quotes look better, most people
can‘t type them  easily or at all. In keeping with the aim of fast editing, it
would be best to allow editors to type in standard quotes but display
distinctive quotes. Same goes for the ellipses…
Comment 1 Nathan Hamblen 2005-02-11 14:32:01 UTC
Created attachment 284 [details]
New parser function to convert various strings to UTF-8 entities

This patch uses to regular expressions to determine correct quote or ellipses
to use. It leaves alone everything in preformatted sections. I've tested it in
all the quote situations I can come up with and tweaked the regexp till it
worked. There may be more; it would be best to have this pointed at a live
backup for a while.

NOTE: I moved the em dash code into this function, so if a fix for bug #1485 is
checked into HEAD it will need to be undone. (I'd be happy to update this patch
to delete the other if that happens.)
Comment 2 Alexander Sigachov 2005-02-13 20:12:59 UTC
Please see section "Quote signs in several languages" in

Also about dashes, example in Russain language n-dash absent. There are only m-dash.
Comment 3 Nathan Hamblen 2005-03-02 16:36:44 UTC
(In reply to comment #2)

Thanks for that link, Alexander. I would have never guessed that ”this format”
was standard in Swedish. sv.wikipedia would have a legitimate gripe if their
ASCII quotes were converted to English-style opening and closing quotes. 

I also checked Romanian, whose wikipedia seems to use "these quotes" even though
they resemble none of the standard or alternative quotes in the table. In that
case, it's hard to call the quote conversion incorrect when the original was
also incorrect. 

Ideally, languages whose quotation marks are very different from the ASCII ones
would not use the ASCII marks at all. Russian (I glanced over the page in
Russian on Russian language) seems to use UTF-8 codes. In that case, there is no
issue; the conversion routine will not touch them.

Of course, there's still the problem with Swedish and other languages that use
quoting schemes that are close to, but not exactly like, English. For them it
would be necessary to disable the conversion, or if someone wants to do it,
provide alternate conversion.

Would that be satisfactory? The principle I'm pitching is that we don't have to
provide the convenience function for every language, but we do have to avoid
making things worse for them.
Comment 4 Brion Vibber 2005-03-09 07:48:30 UTC
I'm inclined to close this as WONTFIX. It's not possible to get it right automatically 
in all cases, and wrong "smart" quotes are much more annoying than straight 
quotes (which are always "right" even if they're not as pretty as you might 
sometimes like).
Comment 5 Nathan Hamblen 2005-03-09 13:08:44 UTC
(In reply to comment #4)

I think that would be premature. This is only a proposed enhancement for a
future version of the software, why not let it be? Besides, it's not for you or
me to say what is typographically "correct," it's up to everyone using
wikimedia. I suggest we see how things go with bug 1485. If it's a success, I'll
ask the users if they want something similar — but only 95% accurate — for quote
marks and ellipses.

Note You need to log in before you can comment on or make changes to this bug.