Last modified: 2014-02-18 12:00:32 UTC
Currently, anchor ids are created four different ways at the five different places they are used. As a test case, try "_ +.3A%3A]]" TOC (Parser.php): "__.2B.3A.253A.5D.5D" Link (Title.php): "_.3A:.5D.5D" redirectToFragment (Article.php): "_.3A:.5D.5D" History (Linker.php): "_.2B.3A.253A" Anchorencode (CoreParserFunctions.php): "__.2B:.253A.5D.5D" See [[User:Amalthea/test10]] for a demonstration This regularly breaks the link from history/contributions/RC to the section, makes it hard or impossible to duplicate the functionality in tools (NAVPOP just now), and can break normal section links. I presume this could easily be fixed by all using the same static function from Title::escapeFragmentForURL, without any additional superfluous logic (in particular stripping "[[", "[[:", "]]" in Linker.php). The only thing that will still be necessary is ensuring unique ids in the TOC of course. This can still make those links point to unintended sections, but in a controlled way. See also Bug 17857 and Bug 2831.
To clarify, the test case above is contrived, but this is a very common problem. The section links in the history of [[:en:Wikipedia:Requests for page protection]] for example *never* work.
Similarly, on Commons, where many section headers are generated by templates or messages, for localization, the automatic section links in the page histories also do not work, e.g. http://commons.wikimedia.org/wiki/User_talk:Gerald_Troy?action=history
There are some other inconsistent with anchors: 1. The anchor added in http://de.wikipedia.org/w/index.php?diff=prev&oldid=73068456 has a invisible character. The autocomment skips this. The TOC and the anchor of that section has it. So the autocomment link does not link to the section 2. When editing a section which headline has double spaces between two words, after save you are not get back to the section, because the anchor added to the url has two underscores, which is not like the anchor of the section/TOC. 3. bug 22784 Thanks.
At least partially solved in r68272, by correcting the autosummaries. This should fix at least 1 and 2 of Umherirrender and i think it also deals with the most important problems of Bug 2831.
As of r68343: Umherirrender's problems are fixed. {{anchorencode}} works the same as the redirect from edit-section to article. Both using Parser::guessSectionNameFromWikiText. /* autocomments */ that we auto-generate are also the same, except that non-linking [[ and ]] are removed. /* comments */ that are provided by the user do not have the HTML stripped from them, but all [[ and ]] are removed. I think the best way forward here is to try and pass user-generated /* comments */ through Parser::stripSectionName on save, and remove the [[ and ]] removal from Linker::formatAutocomments [[#links]] are still just wrong, they are urldecoded() before handling, and do not have whitespace normalised (nor do they have HTML or links stripped, but they can't contain those anyway). I think the way forward is to add the whitespace normalization to Title::escapeFragmentForURL, not so sure about stopping the urldecode() of the anchor - that could also be done. It may cause problems for interwiki's that aren't wikis.
There's no way for anything outside the parser to match Parser.php's behavior here -- it generates the link after parsing, so you'd need to parse, and you can't. It's not obvious how to avoid this -- if the parser generates the id's before preprocessing, it will miss any that come from templates, but if it generates them after, you can have stupid stuff like {{CURRENTTIME}}. You could also have *really* stupid stuff like """ == {{foo}} == """ where {{foo}} expands to """ text == == more text """ Hard to say what to do in these cases. Ideally we should have the parser and non-parser code agreeing on section id's at least if they don't have any curly braces/parser functions in them, which is the common case.
Re comment 6, This bug isn't about the template expansion, that's bug 5019, just problems with the various encoding functions. As far as I'm aware the only change that now needs to be made is to stop the urldecoding() of the #-fragment by the parser. Whether that change actually wants to be made, I'm less confident, but I think so. (As %-encoding is not used there, it seems wierd to un-%-encode it).
I don't see how "encoding" is logically separate from "stripping wikitext" when all these functions fundamentally operate on wikitext input. But anyway, improvements are good, even if this is really a sub-bug of bug 5019. Dunno why percent-decoding is done. Probably best to dig around in the history to see if it was added due to an actual bug or just some random decision. Either percent-decode in all these cases or none. I'd think none is better than all here, but who knows what might have come up to cause someone to do the decoding there.
*** Bug 24412 has been marked as a duplicate of this bug. ***
*** Bug 36333 has been marked as a duplicate of this bug. ***
Change 113943 had a related patch set uploaded by Burthsceh: fix escaping fragment of Title https://gerrit.wikimedia.org/r/113943