Last modified: 2014-11-17 09:47:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6901, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4901 - lang and hreflang attributes for interwiki links
lang and hreflang attributes for interwiki links
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low enhancement with 9 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org/wiki/Benutzer...
: accessibility, patch, patch-reviewed
: 5887 9690 13867 24741 (view as bug list)
Depends on: 20646
Blocks: 367 semantic-html
  Show dependency treegraph
 
Reported: 2006-02-07 02:45 UTC by Michael Zajac
Modified: 2014-11-17 09:47 UTC (History)
19 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Provisional patch (2.58 KB, patch)
2011-05-17 13:10 UTC, Brion Vibber
Details
More limited patch, only adds hreflang & lang to sidebar interwiki links (550 bytes, patch)
2011-11-30 23:11 UTC, Brion Vibber
Details
Recording of iOS 5.0.1 VoiceOver reading es, fr, ja, zh sidebar links (before) (1.80 MB, audio/x-wav)
2011-11-30 23:13 UTC, Brion Vibber
Details
Recording of iOS 5.0.1 VoiceOver reading es, fr, ja, zh sidebar links (after) (1.86 MB, audio/x-wav)
2011-11-30 23:14 UTC, Brion Vibber
Details

Description Michael Zajac 2006-02-07 02:45:57 UTC
Links to other-language Wikipedias should have an hreflang attribute, 
indicating the language of the link target.  Furthermore, the standard 
interwiki margin links should have a lang and xml:lang attribute, indicating 
that the link text is in another language.

In both cases, the content of the attributes should be the same language 
code as in the Wikipedia's name.  Example: links to German wikipedia 
(de.wikipedia.org) should have hreflang="de", and the interwiki links 
should look like <a hreflang="de" lang="de" xml:lang="de" href="http://
de.wikipedia.org/...">...</a>.

This feature would aid current or future user agents, including screen 
readers for the disabled.  It could also be used for research projects, and 
to allow style sheets to style other-language links based on these 
attributes.

This may be a duplicate of Bug 1433, but I couldn't tell because the 
examples there don't seem to work.
Comment 1 Brion Vibber 2006-02-07 02:49:53 UTC
1433 seems to be adding <link> elements to the header, which is a separate task from 
marking the inline <a href> links with language markers.

For interwiki links in general we don't actually know whether they're going to be 
languages or not (eg 'MeatBall' and 'Google' aren't languages); though the 
assumption is generally made that local interwikis are language links. May or may 
not be wise to add a language marker tag on the interwiki table.
Comment 2 Brion Vibber 2006-05-09 20:51:09 UTC
*** Bug 5887 has been marked as a duplicate of this bug. ***
Comment 3 Platonides 2006-05-10 15:08:20 UTC
Why don't make it a new parameter on interwiki table? if iw_lang is not null,
add  lang="<iw_lang stored code>" to the link.

BTW: [[meta:Interwiki_table]] instead of giving information desinformates people
with presuptions.
Comment 4 Brion Vibber 2007-04-26 19:09:53 UTC
*** Bug 9690 has been marked as a duplicate of this bug. ***
Comment 5 Raimond Spekking 2009-01-07 08:53:01 UTC
*** Bug 13867 has been marked as a duplicate of this bug. ***
Comment 6 Michael Zajac 2009-01-19 17:38:49 UTC
Identifying language changes is a Priority 1 checkpoint in WCAG 1.0 (http://www.w3.org/TR/WCAG10/wai-pageauth.html#tech-identify-changes), and an AA guideline in WCAG 2.0.  This bug adversely affects accessibility, and blocks Bug 367 Markup accessibility issues (tracking).
Comment 7 The Evil IP address 2010-07-14 18:56:41 UTC
I think that the hreflang should really be added to the interlanguage links that appear within the sidebar, since it's sure that this is the language of the link.

But for the usual interwiki inline links, be it by adding a colon before the interlanguage link or by an interwiki link (i.e. to another project) should not have the lang attribute, since they may be piped and then it would be wrong to mark them as a foreign language. Also, Brion makes a good point above describing the usage of stuff like "google" or the like.
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-14 19:15:19 UTC
hreflang is unlikely to be useful.  Nothing I know of uses it, and I don't think we should add it just because we can.

Adding lang="" to interwiki links is more reasonable, although I'm not sure how necessary it is.  Do we have any actual complaints from users, or is it just theoretical WCAG stuff?  Is anyone with a screen reader really going to sit and listen to the list of languages anyway?  If so, how much does it help the screen reader if the language is specified explicitly?  Maybe we should add all these attributes just because WCAG says so, but I'd be happier if we had more concrete data than that.

If we do add lang="", we have to be careful, because several of our language codes are nonstandard.  Also, certainly we shouldn't add a duplicate xml:lang="" as well, that's just a waste of space.
Comment 9 Chad H. 2010-08-10 16:38:25 UTC
*** Bug 24741 has been marked as a duplicate of this bug. ***
Comment 10 Andy Mabbett 2011-05-16 11:34:47 UTC
HREFLANG combined with rel="alternate" would allow parsers to determine, for any article, the existence and location of equivalent articles in other languages.

Meta headers with the same properties should also be used for the same reason.
Comment 11 Andy Mabbett 2011-05-16 11:35:37 UTC
HREFLANG combined with rel="alternate" would allow parsers to determine, for any article, the existence and location of equivalent articles in other languages.

Meta headers with the same properties should also be used for the same reason.
Comment 12 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-17 00:02:48 UTC
That's hypothetical.  If we add metadata to every page just because theoretically someone could use it, we'd be adding practically unlimited quantities of metadata.  We shouldn't be adding this stuff to every page unless we know of specific users who actually need it for some real-world purpose.
Comment 13 Michael Zajac 2011-05-17 02:52:27 UTC
I see that Wikipedia now has lang and xml:lang attributes on these links. The latter does seem pointless, since the root HTML element only has the former.
Comment 14 Bawolff (Brian Wolff) 2011-05-17 04:51:39 UTC
(In reply to comment #13)
> I see that Wikipedia now has lang and xml:lang attributes on these links. The
> latter does seem pointless, since the root HTML element only has the former.

Are you looking at a different Wikipedia then I am? We're talking about interlanguage links in the sidebar right? I can't seem to see it. (also I grepped for xml:lang in trunk, its not used in the html output anywhere as far as I can tell.

I could see the lang attribute potentially useful to screen readers needing to switch rules when pronouncing (It'd be nice if someone with an actual screen reader could confirm that), but hreflang seems really useless, unless someone can actually name an actual existing application that does something with it.
Comment 15 Andy Mabbett 2011-05-17 11:57:54 UTC
I disagree fundamentally with Aryeh; metadata should be provided as matter of course, so that it's there when others look for it. 

We shouldn't wait to be asked for two reasons - many potential users of it will not ask, they'll simply move on; and if they do ask, how long will it take us to implement it?

In any case the question is moot, because there are already services wanting to use it, and finding us lacking in not having it: 

http://shkspr.mobi/blog/index.php/2011/02/qr-codes-for-museums/
Comment 16 Derk-Jan Hartman 2011-05-17 12:25:09 UTC
> In any case the question is moot, because there are already services wanting to
> use it, and finding us lacking in not having it: 
> 
> http://shkspr.mobi/blog/index.php/2011/02/qr-codes-for-museums/

@Andy I don't see why we would need hreflang for any of that what you linked to. I mean the iphone Articles app allows you perfectly well to switch between the interwiki language variants of an article, it doesn't need hreflang for that and nor should a QR service.
Comment 17 Derk-Jan Hartman 2011-05-17 12:25:54 UTC
For everyone who cannot keep up

- lang: denotes the language of the text wrapped in the html element. It is usually only used if the language of the text doesn't match the rest of the text in the page.
- xml:lang is the xml variant of lang. Since we are switching from xhtml1 to html5, xml:lang is less important adding both is probably a bit much.
- hreflang: denotes the language of the page that the link points to.
- alternate: denotes that the link is towards the same content as the current page, but in an 'alternate' version

So <a href="http://de.wikipedia.org/" hreflang="de" lang="nl">Hoofdpagina van de Duitse Wikipedia</a> is valid. It is Dutch language text in an English language page, that links to something written in German. But you cannot use alternate here, because that german mainpage is not the german version of this page that we are currently visiting, bugticket 4901 on bugzilla.
Comment 18 Derk-Jan Hartman 2011-05-17 12:35:45 UTC
(In reply to comment #13)
> I see that Wikipedia now has lang and xml:lang attributes on these links. The
> latter does seem pointless, since the root HTML element only has the former.

We have the lang and xml:lang on inline interwiki links it seems. Not on interlanguage sidebar links. This can be observed in the sourcecode of the English Main page. The interwiki links in the sidebar are 'plain', but the interwiki links at the bottom under section "Wikipedia languages" have lang links.
Comment 19 Brion Vibber 2011-05-17 12:41:57 UTC
The links you're seeing on http://en.wikipedia/org/wiki/Main_Page are not created automatically -- someone has explicitly placed a <span> inside the link.
Comment 20 Brion Vibber 2011-05-17 12:49:29 UTC
Anyway long story short:

* Adding a 'lang' attribute on interlanguage sidebar links should be easy to do and not a problem, and will help screen readers & browser font selection. The text of the link is the native name of the language and therefore in that language.

* Adding a 'lang' attribute on inline interlanguage links (inline interwiki links which are prefixes we also interpret as magic sidebar language links) is technically feasible, however may be problematic as we don't know whether the *text of the link* is in any particular language.

* Adding cloned 'xml:lang' attributes anywhere is IMO pointless, everybody parses in HTML mode anyway. :)

* Adding an 'hreflang' attribute on interlanguage sidebar links and on inline interlanguage links is technically feasible and doesn't hurt. Slight increase in output size, but not too bad I think. There could be some users for whom it would be welcome.

Current draft HTML 5 spec for hreflang attribute: http://www.w3.org/TR/html5/links.html#attr-hyperlink-hreflang


Note that  the bug 20646 dependency isn't actually required for this bug to proceed; we already do interlanguage sidebar selection & language name picking based on the interwiki prefix, so this would need only initially cover those cases.
Comment 21 Brion Vibber 2011-05-17 13:10:58 UTC
Created attachment 8545 [details]
Provisional patch

Provisional patch adds:
* hreflang on inline interwiki links whose prefixes match language names
* lang & hreflang on sidebar interlanguage links in SkinTemplate-based skins (tested Monobook and Vector)

The older skins (Standard, CologneBlue etc) still need to be poked. Might be other funkiness.
Comment 22 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-17 14:16:56 UTC
(In reply to comment #20)
> * Adding an 'hreflang' attribute on interlanguage sidebar links and on inline
> interlanguage links is technically feasible and doesn't hurt. Slight increase
> in output size, but not too bad I think. There could be some users for whom it
> would be welcome.

Sure, but the same could be said of a ton of things.  We could put rel=something on loads of our links (http://microformats.org/wiki/existing-rel-values#formats), and start putting RDFa and microdata all over the place.  hreflang by itself wouldn't take up much space, but adding every piece of metadata of similar possible utility would make our output unreasonably long.  For sanity's sake, we need to draw a line as to what kind of metadata we're going to output, and it's got to be more restrictive than "someone might be able to use it".

> Current draft HTML 5 spec for hreflang attribute:
> http://www.w3.org/TR/html5/links.html#attr-hyperlink-hreflang

You don't want to use the /TR version of W3C specs -- that's usually months outdated.  Use the dev.w3.org version instead:

http://dev.w3.org/html5/spec/links.html#attr-hyperlink-hreflang

(Currently that's also labeled Working Draft, but only because a new Working Draft is awaiting publication.  Usually it's labeled Editor's Draft.)

Alternatively, use the WHATWG version so you don't have to worry about the W3C's crazy versioning scheme:

http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#attr-hyperlink-hreflang

The WHATWG version also includes extra features that aren't included in the W3C version because of feature freeze, plus a handful of minor differences.
Comment 23 Michael Zajac 2011-05-17 14:43:52 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > I see that Wikipedia now has lang and xml:lang attributes on these links. The
> > latter does seem pointless, since the root HTML element only has the former.
> 
> Are you looking at a different Wikipedia then I am? We're talking about
> interlanguage links in the sidebar right? 

Oops; I was looking at the HTML source of the “Wikipedia languages” section of the main page. 

But this does indicate that there is demand for this from WP editors.
Comment 24 Andy Mabbett 2011-05-17 14:53:10 UTC
@Derk-Jan 

The iPhone app may well work; the service to which I linked does, also. But it's necessary to write code which screen-scrapes, or understands how Wikipedia uses interwiki links and its URL structure (which fails where article titles are not equivalnet (Mona Lista vs La Giaconda, for example)), rather than simply being able to recognise links to alternate  (which in British English would be "alternative"; "aleternate" meaning something else) versions of a page in other languages through the semantic metadata thoughtfully provided in HTML specifications.

The HTML 4 spec defines "alternate" as "[Designating] substitute versions for the document in which the link occurs." While translation is mentioned as one example, there is no requirement that they be literal or word-for-word translations.

@Aryeh.

Reductio ad absurdum. Please don't.

That's not to say we shouldn't use more rel-values (like "previous" & "next" in Wikipedia navboxes) and other metadata. A while ago, when I downloaded the en-Wiki article on Barak Obama it was 1814 KB; the microformat in it comprised just 110 characters of the emitted HTML code (~0.005% of the full download). That's not as many as in the preceding sentence.
Comment 25 Derk-Jan Hartman 2011-05-17 15:02:54 UTC
@Andy, we have that, it's called the API. http://en.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Main%20Page&redirects=&llurl
Comment 26 Derk-Jan Hartman 2011-05-17 15:09:49 UTC
Hmm, i was googling a bit and found this piece of info which has me worried:

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=189077

"Note: rel="alternate" hreflang="x" is for sites that have only their template translated. It isn't appropriate for multilingual sites that completely translate the content of each page. More information about multi-regional and multilingual sites."

Which specifically would be exactly what we do on Wikipedia.
So wether or not it's a good idea standard wise, that remark indicates that it might not necessarily be a good idea for our google page ranking of the other languages.
Comment 27 Michael Zajac 2011-05-17 15:48:28 UTC
(In reply to comment #26)

Rel=alternate is NOT appropriate in Wikipedia, because it implies that the target is a translation of the same writing. The WP language links point to independent writing, which has been selected as the nearest other-language equivalent; occasionally it is a translation, but often it varies greatly in scope and content.

But it might be appropriate in a wiki with a different authoring model (e.g., maybe on Wikisource, which translates original documents), so shouldn't the software accommodate this as an admin-configurable setting?
Comment 28 Derk-Jan Hartman 2011-05-17 18:37:50 UTC
Not according to the Google comment. They specifically say that alternate icw hreflang should only be used where the 'interface' (template in their words) language differs, but the content is the same (so the content should not be translated).
Comment 29 Michael Zajac 2011-05-17 20:50:07 UTC
(In reply to comment #28)
> Not according to the Google comment.

According to the HTML5 spec, “If the alternate keyword is used with the hreflang attribute, and that attribute's value differs from the root element's language, it indicates that the referenced document is a translation.” So like I said, rel=alternate is for alternate translations of a document in, e.g., Wikisource, but not for different documents, e.g., different-language Wikipedia articles.

  http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#rel-alternate

That Google note is confusing, and I think you are reading “is for sites that have only their template translated” as “is *only* for sites that have only their template translated.” It is about one scenario, and doesn't exclude others.
Comment 30 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-17 22:34:21 UTC
(In reply to comment #24)
> Reductio ad absurdum. Please don't.

Reductio ad absurdum is a legitimate argument technique, often used in mathematical proofs.  What I'm saying is if we're going to add metadata like this to every page, we have to establish a bar higher than "someone might theoretically want to use it", because if we only require that, there's no limit to what we could add.  I suggest a standard along the lines of "a nontrivial percentage of users will benefit from it".  If the percentage of users who would benefit is negligible, then it can be provided in some fashion that doesn't require adding bytes to every page.

Another possible standard might be "we'll add any metadata that has dedicated HTML attributes/elements, but not RDFa/microdata/rel values/etc. without real use-cases".  That would allow hreflang but not an unlimited amount of stuff.  I don't see the logic in it, though.

> That's not to say we shouldn't use more rel-values (like "previous" & "next" in
> Wikipedia navboxes) and other metadata. A while ago, when I downloaded the
> en-Wiki article on Barak Obama it was 1814 KB; the microformat in it comprised
> just 110 characters of the emitted HTML code (~0.005% of the full download).
> That's not as many as in the preceding sentence.

hreflang on every interlanguage link for Barack Obama would add over 160*13 bytes, or about 2 KB, if I count right.  Granted, that's only about a quarter of a percent, so it's not exactly a huge issue.  I'd be more concerned about stuff in the <head>, because that will delay loading of the article text.
Comment 31 Bawolff (Brian Wolff) 2011-05-18 01:15:07 UTC
>I suggest a standard along the lines of "a nontrivial percentage of users will 
>benefit from it"

What about a trivial percentage? Is there a single user (who actually exists, not potentially could exist) that would benefit from hreflang attribute? Even if the cost of doing this is very small, if there's not a single user I'd call it bloat.

The google note in comment 26 is the only remotely related use-case that I see listed here (the QR codes is asking for content-negotiation with accept-language headers, which is rather different), but even the google note is talking about <link> style links, where this bug is talking about inline links in the sidebar. And from my reading the google note is more looking for a way to translate the interface, aka ?uselang=de, then the actual content.
Comment 32 Michael Zajac 2011-05-18 05:47:31 UTC
(In reply to comment #30)
> if we're going to add metadata like
> this to every page, we have to establish a bar higher than "someone might
> theoretically want to use it",

Lang attributes aren't metadata. They are part of the document structure required for universal access. What number of blind people wanting to read Wikipedia, for example, would you consider a “nontrivial percentage?” I'm sure the comment wasn't meant to trivialize people who require assistive technology by reducing their importance to a statistic, but that's what it amounts to.

Identifying natural language changes in a document is a priority 1 checkpoint in WCAG 1.0, is required by the WCAG Samurai Errata, and is required for Level AA conformance in WCAG 2.0. A project that claims “openness” should attempt to meet basic accessibility standards.


References:

WCAG 1.0 Guideline 4. Clarify natural language use
http://www.w3.org/TR/WCAG10/wai-pageauth.html#gl-abbreviated-and-foreign

WCAG 1.0 HTML Technique 2.1 Identifying changes in language
http://www.w3.org/TR/WCAG10-HTML-TECHS/#changes-in-lang

WCAG Samurai Errata for WCAG 1.0 Guideline 4. Clarify natural-language usage
http://wcagsamurai.org/errata/errata.html#GL4

WCAG 2.0 Guideline 3.1 Readable: Make text content readable and understandable.
http://www.w3.org/TR/WCAG20/#meaning

How to Meet WCAG 2.0 Guideline 3.1.2 Language of Parts
http://www.w3.org/WAI/WCAG20/quickref/#qr-meaning-other-lang-id

Techniques for WCAG 2.0 H58: Using language attributes to identify changes in the human language
http://www.w3.org/TR/2010/NOTE-WCAG20-TECHS-20101014/H58
Comment 33 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-18 22:39:05 UTC
(In reply to comment #32)
> Lang attributes aren't metadata.

For the sake of argument, please interpret my use of "metadata" to mean "information that's served in the webpage but doesn't noticeably affect the behavior of regular browsers".

> What number of blind people wanting to read
> Wikipedia, for example, would you consider a “nontrivial percentage?”

Oh, pretty much any percentage.  0.1% would be fine, maybe even less.  So long as it's not theoretical.  What benefit does hreflang have to blind users, in practice?

> Identifying natural language changes in a document is a priority 1 checkpoint
> in WCAG 1.0, is required by the WCAG Samurai Errata, and is required for Level
> AA conformance in WCAG 2.0.

None of your references appear to have anything to do with hreflang, as far as I can tell.  Am I mistaken?  They all seem to be about lang.  I think comment 8 was pretty clear that I'm not objecting to adding lang, although I'd be happier if we had data about actual screenreaders that benefit (since WCAG et al. can be pretty ivory-tower sometimes).
Comment 34 Michael Zajac 2011-05-19 03:15:51 UTC
(In reply to comment #33)

Fair enough, Aryeh. I was arguing for using lang, which which the standards require, not for hreflang, which they do not (although I don't see any reason not to use HTML markup such as hreflang to tag known page elements, when each page already has about 30 kB of non-content markup, scripts, and style sheets).

> For the sake of argument, please interpret my use of "metadata" to mean
> "information that's served in the webpage but doesn't noticeably affect the
> behavior of regular browsers".

I don't accept that there's a useful concept of regular browser. People using screen readers use them to operate (“regular”) desktop browsers. Is the Google search indexer a regular browser, or is Google Language Tools? Or Mobile Safari on an iphone used by a blind person? What about the screenscrapers used by the countless sites that syndicate wikimedia sites?

> None of your references appear to have anything to do with hreflang, as far as
> I can tell.  Am I mistaken?  They all seem to be about lang.  

Right.

> I'd be happier
> if we had data about actual screenreaders that benefit (since WCAG et al. can
> be pretty ivory-tower sometimes).

We're not experts, and we don't have a detailed comprehensive survey of accessibility software and hardware capabilities, nor are we ever likely to. The best we can do is follow accepted, unchallenged standards. If you do have information that disputes them, I would be interested. But grumbling about them is unhelpful.
Comment 35 Bawolff (Brian Wolff) 2011-05-19 03:45:24 UTC
In the interest of fairness, googling suggest that generated css content is a potential use case for hreflang. You can use css to put ((fr)) after every french link or something like that. Its a kind of far fetched semi-theoretical use case, but its more concrete than anything said so far.
Comment 36 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-19 18:22:13 UTC
(In reply to comment #34)
> I don't accept that there's a useful concept of regular browser.

The concept of a regular browser is essential, because the ones who create and maintain site content, and MediaWiki developers and sysadmins, are overwhelmingly people who use regular browsers almost exclusively.  Any feature whose purpose is to change how pages work in regular browsers is easy for all of us to reason about.  If something doesn't affect regular browsers, it's far more likely to be used improperly, because whoever's using it probably won't see the effect.  Thus we need to be more careful when deploying it, to take care that it's actually useful.

> People using
> screen readers use them to operate (“regular”) desktop browsers. Is the Google
> search indexer a regular browser, or is Google Language Tools? Or Mobile Safari
> on an iphone used by a blind person? What about the screenscrapers used by the
> countless sites that syndicate wikimedia sites?

None of these are "regular" in the sense I mean.  That doesn't mean they're unimportant, but it means that it's much more important that we be careful about targeting features at them, because it's hard for almost all of us to directly evaluate the effects.  Look at how much snake oil is marketed at SEO, or how often people provide alt text that's useless or even worse than useless (e.g., unpronounceable numeric filenames that screen readers will read out at length).  Adding features that you haven't tested on the theory that they might be useful and probably won't hurt is not a great strategy for software development.

> We're not experts, and we don't have a detailed comprehensive survey of
> accessibility software and hardware capabilities, nor are we ever likely to.
> The best we can do is follow accepted, unchallenged standards. If you do have
> information that disputes them, I would be interested. But grumbling about them
> is unhelpful.

I've argued in the HTMLWG with some of the people who write these standards.  My (admittedly biased) take is that they know a lot about accessibility, but often don't fully appreciate the requirements of authors, browser implementers, or non-disabled users.  A huge percentage of disputes about HTML5 that had to be adjudicated by Working Group survey, on the order of half, are about accessibility:

http://www.w3.org/html/wg/#events

That's because it's the browser implementers who edit the HTML5 standard, and the accessibility people are regularly unable to convince them that the requirements they want are reasonable, so the dispute essentially has to be arbitrated.  Specs like WCAG are written by a11y experts with relatively little involvement of non-a11y-focused web experts, and IMO should be viewed with a healthy dose of skepticism as a result.  (Which is not the same as saying they should be disregarded.)

Even aside from that, many standards have a tendency to be impractical or theoretical.  Lots of stuff in HTML 4.01 didn't match what any browsers did or intended to do, for instance.  The XHTML line of standards after 1.0 was basically never implemented at all.  Trusting that what standards say makes sense is maybe a reasonable first guess, but you definitely shouldn't assume it very strongly.
Comment 37 Michael Zajac 2011-05-19 20:04:57 UTC
(In reply to comment #36)

> The concept of a regular browser is essential, [...]

Then please define “regular browser,” at least broadly. 

Does this include MSIE 9? Safari 4? MSIE 6? Netscape 4? Mobile Safari? BlackBerry mobile browser? What about the yet-unreleased MSIE 10, Firefox 5, Safari 6, etc? 

Does it include supporting readers who navigate web pages using the keyboard? Using a touch device? Using a text-only browser? A braille display?

(And why would you not include Google's indexer? I'd bet that more people access more Wikipedia articles through it than any other way.)

I think it's a mistake to categorize “regular” use of a website as that involving a narrowly stereotyped range of technology, especially if we try to define assistive technologies as irregular. This risks specifically ghettoizing a disadvantaged group of people. Design and technical features for accessibility have the potential to improve a whole range of uses of a website.

> because the ones who create and
> maintain site content, and MediaWiki developers and sysadmins, are
> overwhelmingly people who use regular browsers almost exclusively

[Citation needed]

> If something doesn't affect regular browsers, it's far
> more likely to be used improperly, [...]

Yeah, we can't test everything in every browser, not even in every “regular” one. That's why we fall back on standards. 

> Even aside from that, many standards have a tendency to be impractical or
> theoretical.  Lots of stuff in HTML 4.01 didn't match what any browsers did or
> intended to do, for instance.  The XHTML line of standards after 1.0 was
> basically never implemented at all.  Trusting that what standards say makes
> sense is maybe a reasonable first guess, but you definitely shouldn't assume it
> very strongly.

Those are't the standards we are using. We are using current, practical, accepted standards. 

But when the topic is accessibility, i.e. the availability of free and open information to a minority who may have disadvantaged access to it, we should be leading and liberal in adopting solutions that may reduce friction without causing appreciable harm.

We should definitely use lang attributes here, which are standardized, recommended, and supported. 

I see no strong argument against using hreflang attributes, but I don't think they're absolutely necessary. I support adding a feature that might improve the experience with assistive technology, but I'm okay with moving that to a separate bug listing if it will give us consensus here.
Comment 38 Derk-Jan Hartman 2011-05-19 22:25:57 UTC
I sort of have to agree with Aryeh here. I mean, we could implement aural stylesheets, because it's somewhere in a standard and at the face of it looks very useful. But reality is, NOTHING in the whole world looks at the aural stylesheets, least of all screenreaders and no support is even remotely expected in the future. As such it's an exercise in pointlessness that adds bytes for the 400 million of unique visitors that we have to serve on a monthly basis.

Not everything that is in a standard is a good idea in practice. XHTML proved that, and that was even WIDELY supported.

I have not seen a good usecase for hreflang just yet, other than the google SEO technique, which is not applicable to interlanguage links. (lang is a different case, I fully support lang).
Comment 39 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-05-20 18:44:09 UTC
As usual, I can't resist arguing just a bit more . . .

(In reply to comment #37)
> Then please define “regular browser,” at least broadly. 
> 
> Does this include MSIE 9? Safari 4? MSIE 6? Netscape 4? Mobile Safari?
> BlackBerry mobile browser? What about the yet-unreleased MSIE 10, Firefox 5,
> Safari 6, etc? 

Yes.  Those all display web pages in basically the same way, give or take some details.  Mobile browsers are a bit different, but they're mainstream enough that they fall into the same category -- a large percentage of people use them regularly.  Thus we can easily make informed decisions about how various features will affect them.  Just try it out.

Search engine spiders and screenreaders work in fundamentally different ways from the browsers we use, and we don't have experience with how they work.  ("We" here means you, me, and the overwhelming majority of Wikipedians/MediaWiki developers/etc.)  We cannot easily know offhand what effects our decisions will have on such UAs, so we have to be much more cautious in making decisions based on those effects.  Reasoning and evidence needs to be spelled out more explicitly than for regular browsers.

> Does it include supporting readers who navigate web pages using the keyboard?

Exclusively?  No, because almost no one does that.  (Which, again, is not to say that those people are unimportant, but that other people can't easily evaluate the impact that changes will have on them.)

> Using a touch device?

Yes, many people use smartphones regularly these days.

> Using a text-only browser? A braille display?

Nope.

> (And why would you not include Google's indexer? I'd bet that more people
> access more Wikipedia articles through it than any other way.)

We use the search results.  That doesn't give us more than the most minimal insight into how the indexer works, beyond really basic stuff like "it mostly doesn't do JavaScript".

> I think it's a mistake to categorize “regular” use of a website as that
> involving a narrowly stereotyped range of technology, especially if we try to
> define assistive technologies as irregular. This risks specifically ghettoizing
> a disadvantaged group of people. Design and technical features for
> accessibility have the potential to improve a whole range of uses of a website.

I think it's a mistake to try adding features that you haven't tested and don't intend to test, that no one who reviews your code will test either, and that you'll probably never receive any real-world feedback on, solely based on something you read, where you are in no position to personally evaluate the validity of the source (e.g., have not been provided with info on how exactly specific real-world UAs will handle the feature).  The most likely outcome is that the feature is useless, and in some cases it could be harmful.

(See bug 15491 for an example of someone making incorrect claims about the accessibility impact of a particular bug, based on theoretical reasoning rather than actual testing.)

> > because the ones who create and
> > maintain site content, and MediaWiki developers and sysadmins, are
> > overwhelmingly people who use regular browsers almost exclusively
> 
> [Citation needed]

It doesn't need a citation, because it's a tautology -- that's my definition of "regular browser".  Do you argue that more than a small minority (say, 5%) of the people I mentioned use screen readers regularly, or are personally familiar with how search engine spiders work (as in writing them and not just reading the results)?

> Those are't the standards we are using. We are using current, practical,
> accepted standards.

You assume WCAG is practical.  It's certainly not as useless as XHTML2, but it's not such a clearly good standard that I wouldn't prefer real-world info on how screen readers use the lang attribute.  If you're aiming for usability improvements, give me an actual user study over a standard any day of the week.

Interestingly, this is Google result #5 for "WCAG":

http://www.alistapart.com/articles/tohellwithwcag2

But FWIW, a quick Google search turns up some data that suggests that specific major screen readers really do use lang, so I'm fine with adding it (although I never had serious objections to start with):

http://reference.sitepoint.com/html/core-attributes/lang
http://developer.yahoo.com/blogs/ydn/posts/2008/03/yahoo_search_re/

I'd never have objected if that was the original reasoning given for the change, instead of WCAG.  Real-world data is always more useful than standards, if your goal is to improve user experience instead of adhering to the standards per se.

> I see no strong argument against using hreflang attributes, but I don't think
> they're absolutely necessary. I support adding a feature that might improve the
> experience with assistive technology, but I'm okay with moving that to a
> separate bug listing if it will give us consensus here.

We don't really need consensus.  If Brion wants to add it, he can do so, and I'm not going to object further.  I still don't think we should support it without a clear reason, but it's not up to me to decide.
Comment 40 Sumana Harihareswara 2011-11-25 19:55:00 UTC
Marking patch as reviewed, since there's been so much discussion since Brion's initial patch and since trunk's moved on since he posted it.
Comment 41 Brion Vibber 2011-11-30 16:16:50 UTC
*** Bug 32725 has been marked as a duplicate of this bug. ***
Comment 42 Michael Zajac 2011-11-30 18:05:50 UTC
FYI, the lang attribute for elements on the page is supported by VoiceOver, which comes with iOS and works in Safari. It loads appropriate voices when available, and reads various-language text properly. I'm not a regular user, but successfully tested the demo on this page.

  http://www.456bereastreet.com/archive/201004/using_the_lang_attribute_makes_a_difference/
Comment 43 Brion Vibber 2011-11-30 18:28:29 UTC
(In reply to comment #42)
> FYI, the lang attribute for elements on the page is supported by VoiceOver,
> which comes with iOS and works in Safari. It loads appropriate voices when
> available, and reads various-language text properly. I'm not a regular user,
> but successfully tested the demo on this page.
> 
>  
> http://www.456bereastreet.com/archive/201004/using_the_lang_attribute_makes_a_difference/

I can confirm this on my iPod Touch (iOS 5.0.1), and it is *awesome*. :)

This also works on many of the language links on https://www.wikipedia.org/ which have lang="" attributes to specify the language of their contents. The ones in our sidebar still don't, so they get funny accents ("ess-pa-NOL" instead of "ess-panyol").
Comment 44 Michael Zajac 2011-11-30 19:22:47 UTC
> This also works on many of the language links on https://www.wikipedia.org/
> which have lang="" attributes to specify the language of their contents. The
> ones in our sidebar still don't, so they get funny accents ("ess-pa-NOL"
> instead of "ess-panyol").

Except names in non-Latin writing systems are read out as “unpronounceable.” Adding lang attributes would enable a number of these, including Chinese, Greek, Hindi, Indonesian, Korean, Russian, and Thai. List of supported languages: http://support.apple.com/kb/HT3562

It's remarkable, but right and proper, that this is now mainstream technology, in hundreds of millions of devices in people's hands. We support MathML for Professor Fizzbunkle. Hooray. Let's try to support reading text and following links for the one in 40 with impaired vision.
Comment 45 Brion Vibber 2011-11-30 22:04:14 UTC
(In reply to comment #44)
> > This also works on many of the language links on https://www.wikipedia.org/
> > which have lang="" attributes to specify the language of their contents. The
> > ones in our sidebar still don't, so they get funny accents ("ess-pa-NOL"
> > instead of "ess-panyol").
> 
> Except names in non-Latin writing systems are read out as “unpronounceable.”
> Adding lang attributes would enable a number of these, including Chinese,
> Greek, Hindi, Indonesian, Korean, Russian, and Thai. List of supported
> languages: http://support.apple.com/kb/HT3562

At least Chinese and Korean do read out, as long as they have the lang set... hence wanting to set them in other places.

> It's remarkable, but right and proper, that this is now mainstream technology,
> in hundreds of millions of devices in people's hands. We support MathML for
> Professor Fizzbunkle. Hooray. Let's try to support reading text and following
> links for the one in 40 with impaired vision.
Comment 46 Brion Vibber 2011-11-30 23:11:03 UTC
Created attachment 9584 [details]
More limited patch, only adds hreflang & lang to sidebar interwiki links
Comment 47 Brion Vibber 2011-11-30 23:13:15 UTC
Created attachment 9585 [details]
Recording of iOS 5.0.1 VoiceOver reading es, fr, ja, zh sidebar links (before)

Readout from iOS's VoiceOver screen reader with es, fr, ja and zh sidebar links -- without the patch applied.

Reads as:
"In other languages. Heading level 5."
"ess-pa-nol. link."
"fran-sayz. link."
"unpronounceable. link."
"unpronounceable. link."
Comment 48 Brion Vibber 2011-11-30 23:14:47 UTC
Created attachment 9586 [details]
Recording of iOS 5.0.1 VoiceOver reading es, fr, ja, zh sidebar links (after)

Readout from iOS's VoiceOver screen reader with es, fr, ja and zh sidebar links
-- with the patch applied.

Reads as:
"In other languages. Heading level 5."
"Español."
"Français. link."
"Nihongo. link."
"Zhongwen. link."
Comment 49 Brion Vibber 2011-11-30 23:17:56 UTC
Short patch above applied on trunk in r104778 (fixes the sidebar links).
Comment 50 Brion Vibber 2011-11-30 23:27:57 UTC
REL1_18 in r104781, 1.18wmf1 in r104783.

I'm leaving the bug open for the moment since there's still the hreflang on other links (and potentially if you know the link text isn't overridden you could try a lang, but that's tougher).

Also of note: Roan reminded in IRC that the lang attr will be useful when applied to using WebFonts, since we can switch in the appropriate language font if needed based on presence of the lang attribute.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links