Last modified: 2011-03-13 18:06:47 UTC
Although differentiated opening and closing quotes look better, most people
can‘t type them easily or at all. In keeping with the aim of fast editing, it
would be best to allow editors to type in standard quotes but display
distinctive quotes. Same goes for the ellipses…
Created attachment 284 [details]
New parser function to convert various strings to UTF-8 entities
This patch uses to regular expressions to determine correct quote or ellipses
to use. It leaves alone everything in preformatted sections. I've tested it in
all the quote situations I can come up with and tweaked the regexp till it
worked. There may be more; it would be best to have this pointed at a live
backup for a while.
NOTE: I moved the em dash code into this function, so if a fix for bug #1485 is
checked into HEAD it will need to be undone. (I'd be happy to update this patch
to delete the other if that happens.)
Please see section "Quote signs in several languages" in
Also about dashes, example in Russain language n-dash absent. There are only m-dash.
(In reply to comment #2)
Thanks for that link, Alexander. I would have never guessed that ”this format”
was standard in Swedish. sv.wikipedia would have a legitimate gripe if their
ASCII quotes were converted to English-style opening and closing quotes.
I also checked Romanian, whose wikipedia seems to use "these quotes" even though
they resemble none of the standard or alternative quotes in the table. In that
case, it's hard to call the quote conversion incorrect when the original was
Ideally, languages whose quotation marks are very different from the ASCII ones
would not use the ASCII marks at all. Russian (I glanced over the page in
Russian on Russian language) seems to use UTF-8 codes. In that case, there is no
issue; the conversion routine will not touch them.
Of course, there's still the problem with Swedish and other languages that use
quoting schemes that are close to, but not exactly like, English. For them it
would be necessary to disable the conversion, or if someone wants to do it,
provide alternate conversion.
Would that be satisfactory? The principle I'm pitching is that we don't have to
provide the convenience function for every language, but we do have to avoid
making things worse for them.
I'm inclined to close this as WONTFIX. It's not possible to get it right automatically
in all cases, and wrong "smart" quotes are much more annoying than straight
quotes (which are always "right" even if they're not as pretty as you might
(In reply to comment #4)
I think that would be premature. This is only a proposed enhancement for a
future version of the software, why not let it be? Besides, it's not for you or
me to say what is typographically "correct," it's up to everyone using
wikimedia. I suggest we see how things go with bug 1485. If it's a success, I'll
ask the users if they want something similar — but only 95% accurate — for quote
marks and ellipses.