Last modified: 2014-01-28 05:53:03 UTC
If any of the header/subheader is given as == content ==, firefox 1.5.0.7 draws an semi-complete dashed box next to it. Repo: create a page with the following text: ==content== preview or save, and observer the result.
I don't see anything. Does it happen if you log out? Does it happen at the URL I just added to this bug?
That's because you capitalized the word "Content". It must be all lower case.
The heading generates an anchor with name=id=content, which collides with the id=content div. :(
Ouch. That's nasty. The only solution I can see would be to move all header id's to stuff like #h-content instead of #content. (You could also special-case the few bad id's, but that will a) lead to confusion and b) be hard to maintain.)
*** Bug 7662 has been marked as a duplicate of this bug. ***
(In reply to comment #4) > Ouch. That's nasty. The only solution I can see would be to move all header > id's to stuff like #h-content instead of #content. (You could also special-case > the few bad id's, but that will a) lead to confusion and b) be hard to maintain.) Better solution: prefix all interface id's with "mw-" and then ban that from non-interface id's. Should be pretty simple to fix, although it will unfortunately be slightly disruptive.
Even if the aforementioned solutions are applied, someone could just as easily edit/create a page with the following: ==content== <span id="content">text</span> and the same problem would exist. Also, if you don't allow user-supplied ids/anchor names (or derived ids/anchor names from user-supplied content) to have the prefix "mw-", how would you deal with the following: ==mw-content== Let's not forget templates. If a page includes a template, it's possible that both pages use the same id/anchor name, even though within each page individually, the ids/anchor names are unique. And I've found a similar problem with extensions that generate their own ids/anchor names like Cite. (see bug #11625) One thing I've noticed is that if a tag is created with an ID that has characters not allowed, the parser is smart enough to single out the id and swap out the invalid characters with valid ones. What if the parser kept a running list of all the ids and anchor names already in use? When it replaces the invalid id/anchor name characters, it can check against the list to make sure the id/anchor name in question is not already in use. Duplicates would be resolved the same way headers with the same text are resolved. The only issue I can see at the moment are when extensions create links to destination anchors yet to be rendered. Let's take Cite for example. Given the following: I like cheese<ref>It's true!</ref>. ... <references/> when the "ref" tag gets rendered, a link must be created to a destination anchor that doesn't yet exist, so two things have to happen: (a) an id/anchor name must be created on the spot, so it can be linked to the footnote (even the footnote itself has not been created yet), and (b) all other destination anchors must be prevented from using the generated id/anchor name, without preventing the "references" tag from using it, too.
*** Bug 11625 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > What if the parser kept a running list of all the ids and anchor names already > in use? When it replaces the invalid id/anchor name characters, it can check > against the list to make sure the id/anchor name in question is not already in > use. Duplicates would be resolved the same way headers with the same text are > resolved. Something broadly like that is, of course, the only way to fix this bug. To begin with, though, much of the interface isn't run through the Sanitizer, so we'd have to manually (!) keep track of every single one of the hundreds of id's used in the software, which tend not to follow any rhyme or reason. It's still doable, certainly.
Sounds like it might be tedious task, but not necessarily a difficult one. Worst case scenario is that all the IDs and anchor names outside the actual article body are hard-coded into the list. A better option is to have the surrounding HTML completely assembled before the article body is, and pass it into a method that extracts every id and anchor name and adds it to the list.
Patches are appreciated.
*** Bug 13926 has been marked as a duplicate of this bug. ***
*** Bug 17650 has been marked as a duplicate of this bug. ***
*** Bug 21440 has been marked as a duplicate of this bug. ***
*** Bug 21856 has been marked as a duplicate of this bug. ***
Because the heading can start with a non ascii letter a invalid id is created which starts with a point. According to specification of xhtml 1.0 an id has to start with [A-Za-z]. Numbers and some other characters (e.g. point) are only allow at the following character. == Überschrift == creates <span class="mw-headline" id=".C3.9Cberschrift">Überschrift</span> So a prefix to the id should solve this problem because mw-.C3.9Cberschrift would be a valid id.
MediaWiki no longer outputs XHTML1 by default, but HTML5. id's in HTML5 can be any nonempty string that doesn't contain whitespace: http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-id-attribute
(In reply to comment #17) > MediaWiki no longer outputs XHTML1 by default, but HTML5. id's in HTML5 can be > any nonempty string that doesn't contain whitespace: > > http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-id-attribute > But still can (and on WMF wikis does) output XHTML1, so the solution must count with that DTD.
*** Bug 22587 has been marked as a duplicate of this bug. ***
*** Bug 24285 has been marked as a duplicate of this bug. ***
Can't we do it here the way we do it with duplicate sections. For example, == Heading == bla bla... == Heading == bla bla... becomes id="Heading" bla bla bla... id="Heading_2" bla bla bla... In this case, == content == should simply become id="content_2".
Basically, yes. What we have to do is make a list of all the id's used by the software and blacklist them for section titles and other user-provided id's. This is feasible to maintain if we adopt a strict policy of prefixing all software-generated id's with "mw-", which we often do already, although we're not very strict about it. Then we can just blacklist the "mw-" prefix, in addition to a hopefully-not-expanding list of legacy unprefixed id's. We can't feasibly check the list of interface id's used on the current page on the fly, while parsing. This works for things the parser generates, but parser output can't depend on UI output. The same cached parser output is stuck into a variety of skins, plus no skin at all (action=raw, API output, etc.). So we need to get a list of all id's used anywhere in the software and ban them in all pages.
Both sound needed (interface prefix "mw-", and, upcounting them in the headings). With upcounting I mean what The Evil IP address mentioned above. That "mw-content" would be treated like a duplicate heading. So that the following == something == == something == == content == == mw-content == would become id="something" id="something_2" id="content_2" id="mw-content_2"
*** Bug 29049 has been marked as a duplicate of this bug. ***
We also have the problem that with section editing, we get ids in previews which differ from the ids in the full page. That is at least bewildering, and worst may lead to bogus wrong ids being copied and used elsewhere. Editing a page closer to the beginning may lead to ids further down being renumbered. References to ids from elsewhere, e.g. via links having a fragment identfier, should ideally not break in such cases.
In bug 29049, it has been suggested that editors be warned when a page is saved with duplicate id values, also to just accept duplicates during a 2nd save, such like empty "Summary" fields. Maybe even a toggle in Special:Preferences similar to the one for the handling of empty "Summary" fields might be considered for the id= value checking.
A warning on Save does not seem like the right approach. The ID problem is an internal, technical shortcoming of MediaWiki. Exposing this to non-technical editors would just be confusing to them.
*** Bug 29480 has been marked as a duplicate of this bug. ***
(In reply to comment #18) > (In reply to comment #17) > > MediaWiki no longer outputs XHTML1 by default, but HTML5. id's in HTML5 can be > > any nonempty string that doesn't contain whitespace: > > > > http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-id-attribute > > > > But still can (and on WMF wikis does) output XHTML1, so the solution must count > with that DTD. WMS only uses XHTML because of some bots and scripts that haven't updated yet. Eventually WMF WILL be using html5. And as this is a pure validation thing (browsers are not going to care if you use an XHTML doctype but actually follow html5's rules) we don't care about XHTML rules.