Last modified: 2014-02-18 12:00:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20431, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18431 - Anchor links are created based on different methods causing broken links
Anchor links are created based on different methods causing broken links
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low normal (vote)
: Future release
Assigned To: Nobody - You can work on this!
:
: 24412 36333 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-11 23:19 UTC by Amalthea
Modified: 2014-02-18 12:00 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Amalthea 2009-04-11 23:19:43 UTC
Currently, anchor ids are created four different ways at the five different places they are used. As a test case, try "_ +.3A%3A]]"

TOC (Parser.php):                       "__.2B.3A.253A.5D.5D"
Link (Title.php):                       "_.3A:.5D.5D"
redirectToFragment (Article.php):       "_.3A:.5D.5D"
History (Linker.php):                   "_.2B.3A.253A"
Anchorencode (CoreParserFunctions.php): "__.2B:.253A.5D.5D"

See [[User:Amalthea/test10]] for a demonstration

This regularly breaks the link from history/contributions/RC to the section, makes it hard or impossible to duplicate the functionality in tools (NAVPOP just now), and can break normal section links.

I presume this could easily be fixed by all using the same static function from Title::escapeFragmentForURL, without any additional superfluous logic (in particular stripping "[[", "[[:", "]]" in Linker.php).

The only thing that will still be necessary is ensuring unique ids in the TOC of course. This can still make those links point to unintended sections, but in a controlled way.


See also Bug 17857 and Bug 2831.
Comment 1 Amalthea 2009-05-07 10:10:54 UTC
To clarify, the test case above is contrived, but this is a very common problem. The section links in the history of [[:en:Wikipedia:Requests for page protection]] for example *never* work.
Comment 2 Amalthea 2010-02-18 09:17:11 UTC
Similarly, on Commons, where many section headers are generated by templates or messages, for localization, the automatic section links in the page histories also do not work, e.g.
http://commons.wikimedia.org/wiki/User_talk:Gerald_Troy?action=history
Comment 3 Umherirrender 2010-06-18 19:36:18 UTC
There are some other inconsistent with anchors:

1. The anchor added in http://de.wikipedia.org/w/index.php?diff=prev&oldid=73068456 has a invisible character. The autocomment skips this. The TOC and the anchor of that section has it. So the autocomment link does not link to the section

2. When editing a section which headline has double spaces between two words, after save you are not get back to the section, because the anchor added to the url has two underscores, which is not like the anchor of the section/TOC.

3. bug 22784

Thanks.
Comment 4 Derk-Jan Hartman 2010-06-19 13:43:09 UTC
At least partially solved in r68272, by correcting the autosummaries.

This should fix at least 1 and 2 of Umherirrender and i think it also deals with the most important problems of Bug 2831.
Comment 5 Conrad Irwin 2010-06-21 01:55:43 UTC
As of r68343:

Umherirrender's problems are fixed.

{{anchorencode}} works the same as the redirect from edit-section to article. Both using Parser::guessSectionNameFromWikiText.

/* autocomments */ that we auto-generate are also the same, except that non-linking [[ and ]] are removed. /* comments */ that are provided by the user do not have the HTML stripped from them, but all [[ and ]] are removed. I think the best way forward here is to try and pass user-generated /* comments */ through Parser::stripSectionName on save, and remove the [[ and ]] removal from Linker::formatAutocomments

[[#links]] are still just wrong, they are urldecoded() before handling, and do not have whitespace normalised (nor do they have HTML or links stripped, but they can't contain those anyway).

I think the way forward is to add the whitespace normalization to Title::escapeFragmentForURL, not so sure about stopping the urldecode() of the anchor - that could also be done. It may cause problems for interwiki's that aren't wikis.
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-01 21:40:38 UTC
There's no way for anything outside the parser to match Parser.php's behavior here -- it generates the link after parsing, so you'd need to parse, and you can't.  It's not obvious how to avoid this -- if the parser generates the id's before preprocessing, it will miss any that come from templates, but if it generates them after, you can have stupid stuff like {{CURRENTTIME}}.  You could also have *really* stupid stuff like

"""
== {{foo}} ==
"""

where {{foo}} expands to

"""
text ==
== more text
"""

Hard to say what to do in these cases.  Ideally we should have the parser and non-parser code agreeing on section id's at least if they don't have any curly braces/parser functions in them, which is the common case.
Comment 7 Conrad Irwin 2010-07-02 00:13:49 UTC
Re comment 6, This bug isn't about the template expansion, that's bug 5019, just problems with the various encoding functions.

As far as I'm aware the only change that now needs to be made is to stop the urldecoding() of the #-fragment by the parser. Whether that change actually wants to be made, I'm less confident, but I think so. (As %-encoding is not used there, it seems wierd to un-%-encode it).
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-02 00:50:46 UTC
I don't see how "encoding" is logically separate from "stripping wikitext" when all these functions fundamentally operate on wikitext input.  But anyway, improvements are good, even if this is really a sub-bug of bug 5019.

Dunno why percent-decoding is done.  Probably best to dig around in the history to see if it was added due to an actual bug or just some random decision.  Either percent-decode in all these cases or none.  I'd think none is better than all here, but who knows what might have come up to cause someone to do the decoding there.
Comment 9 Bawolff (Brian Wolff) 2010-07-17 16:07:40 UTC
*** Bug 24412 has been marked as a duplicate of this bug. ***
Comment 10 Bartosz Dziewoński 2013-12-07 15:06:22 UTC
*** Bug 36333 has been marked as a duplicate of this bug. ***
Comment 11 Gerrit Notification Bot 2014-02-18 12:00:29 UTC
Change 113943 had a related patch set uploaded by Burthsceh:
fix escaping fragment of Title

https://gerrit.wikimedia.org/r/113943

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links