Last modified: 2011-03-13 18:06:52 UTC
MediaWiki doesn't allow < or > in page titles. Perhaps it has something to do with shell interpretation. But I don't see why this should be a technical limitation. File names in Unix can have < and >, after all. You just have to be very careful about properly quoting.
Quoting RFC 1738 http://www.faqs.org/rfcs/rfc1738.html : The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; They could be escaped to %3C and %3E though.
The same paragraph of rfc1738 lists ^ as also being unsafe, yet we have [[^]] as a redirect to [[circumflex]]. (Also [["]] redirects to [[quotation mark]], [[%]] to [[percentage]], [[\]] to [[backslash]], [[~]] to [[tilde]], and [[`]] to [[grave accent]].) The only characters listed as unsafe in that RFC that we don't allow in page titles are <, >, [, ], {, }, and |. The |, [, and ] are because of wiki-syntax limitations. <, >, {, and }, though, should be allowed in page titles, possibly via %-escaping.
Well, it seems sensible to disallow '{' and '}' for the same reason as '[' and ']' - how would you include a page called "}"? Sure, we could make the user escape them by hand, but that's arguably more ugly than just making them choose a different name, and an invitation for bugs to come and nest in our code. '<' and '>', meanwhile, have the potential to generate malformed output which includes unfiltered HTML tags. Obviously, this is perfectly avoidable, and wouldn't require any mangling by users, but we would have to be very careful to get this right, and the benefits (slightly nicer titles) may not outweigh the risks, and the effort required to avoid them. Just my €0.02, of course...
{ and } are markup used in links and cannot ever be part of page titles for this reason. < and > are disallowed for safety.
Created attachment 825 [details] Trivial edit to Title.php:legalChars I see no reason why [[}]] shouldn't link to the page called }. Similarly, [[a]b]] should link to the page called a]b. I don't know about the safety of < and > (not having audited the entire code!), but I would think that &, ;, and ! are just as dangerous. It appears that Parser.php is written in such a way that it can handle all those characters ([]{}<>) without modification. All that is necessary is to edit Title.php:legalChars in the obvious way (see the patch). Then all of the following Wiki code works as you'd expect [[]]] links to ] [[a]b]] links to a]b [[a]]]] links to a]] [[}}]] links to }} {{}}}} includes Template:}} [[>]] links to > And yes, having a redirect on Wikipedia from [ and ] to Bracket would be useful.
If you think about this for half a second you'll see why that doesn't work: [[This is a]] long page title which is still in the link [[and why not?]] &, ;, and ! are not dangerous in any way. ; and ! have no special meaning at all, and & is merely annoying if output incorrectly (invalid (X)HTML or unexpected character entity).
(In reply to comment #5) > following Wiki code works as you'd expect > > [[]]] links to ] > [[a]b]] links to a]b > [[a]]]] links to a]] > [[}}]] links to }} > {{}}}} includes Template:}} > [[>]] links to > But *are* these always the expected behaviours? * What about using a template or template parameter to determine what title to use (e.g. "[[Wikiquote:{{PAGENAME}}|{{PAGENAME}}]]" or "[[{{{1}}}{{{month}}} 1{{{4}}}|1]]" or "{{SeptemberCalendar{{CURRENTYEAR}}}}"; all real examples)? * And what about images with links in their caption? - e.g. "[[Image:Foo.jpeg|thumb|this is a [[photo]] of [[foo]]]]"; since your patch also allows "[" in titles, this syntax is extremely ambiguous. * Or even just a mix of links and punctuation, like "[See [[foo]]]" While some of these things appear to still work with your patch, because of the order things are processed in the existing code, making them *reliably* do so would be a nightmare.
> But *are* these always the expected behaviours? > * What about using a template or template parameter to determine what title to > use (e.g. "[[Wikiquote:{{PAGENAME}}|{{PAGENAME}}]]" or "[[{{{1}}}{{{month}}} > 1{{{4}}}|1]]" or "{{SeptemberCalendar{{CURRENTYEAR}}}}"; all real examples)? > * And what about images with links in their caption? - e.g. > "[[Image:Foo.jpeg|thumb|this is a [[photo]] of [[foo]]]]"; since your patch also > allows "[" in titles, this syntax is extremely ambiguous. > * Or even just a mix of links and punctuation, like "[See [[foo]]]" Thanks for the constructive comments, Rowan! That's a good point, "[See [[foo]]]" no longer works the same way. (Your other examples do still work.) Note that MediaWiki is currently a bit inconsistent. "[See [[foo]]]" displays as "[See <a>foo</a>]", whereas "[See [[foo|bar]]]" displays as "[See <a>bar]</a>". Also, the alt text of "[[Image:Barnstar.png|[[foo|bar]]] hey [[foo|bar]]]]]" is "bar] hey bar"---the second "[[foo|bar]]]" is interpreted differently. The patch exacerbates this inconsistency. Perhaps Parser.php should be changed so that links end at the ''beginning'' of the first string of two or more ], instead of at the end. Then "[See [[foo|bar]]]" would display as "[See <a>bar</a>]". I understand now what Brion was referring to above about < and > being unsafe. Check out 1. http://en.wikipedia.org/wiki/< (which doesn't exist, but is moderately broken) 2. http://en.wikipedia.org/wiki/Special:Movepage/User:Dbenbenn/%26lt%3B (the "Move page:" field displays wrong) Thus, even without < and > in titles, it's important to escape characters correctly.
> 1. http://en.wikipedia.org/wiki/< (which doesn't exist, but is moderately > broken) Oops, the link above wasn't parsed correctly. Try http://en.wikipedia.org/wiki/%26lt%3B instead.
(In reply to comment #7) > because of the order things are processed in the existing code, > making them *reliably* do so would be a nightmare. Perhaps Rowan is right. For example, currently "[[a [[test]]" links to "test", whereas with the patch it would link to "a [[test". It's somewhat evil to break existing pages. Fortunately, it isn't necessary! You can link to "A" with [[a]] or [[a]]. Similarly, to link to [ with the current parser syntax, you'd expect to use "[[[]]". That doesn't actually work. The reason is that the notions of "characters that can go within a wiki link" and "characters that can be in a page title" are conflated---they're both defined by Title.php:legalChars. If we separate the two concepts, then we can allow page titles with [, ], {, }, and |, without having to modify the parser at all. (And once the safety issues are worked out, we can allow < and > too.) By the way, see bug 3243 for a list of at least 14 places where & isn't correctly HTML-sanitized.
(In reply to comment #10) > Fortunately, it isn't necessary! You can link to "A" with [[a]] or > [[a]]. Similarly, to link to [ with the current parser syntax, > you'd expect to use "[[[]]". Well, to link to an article about '[' now, you could type "[[left bracket]]" - so what would we have gained? OK, the page might look a bit nicer when you get there (although a heading of just '[' might look weird anyway, so you'd redirect to something more verbose; in which case, it amounts to being able to type [[[]] and get redirected to [[left bracket]] anyway!), but this kind of change is frankly a lot of headaches for a very small improvement in the actual software. As for your comments about existing inconsistencies, some of those may well be considered bugs - the "parser", so called, is widely considered extremely ugly, and is "designed" (i.e. hacked together) to work mostly as expected, most of the time. And the fact that such inconsistencies *already* exist should demonstrate just how much trouble would be unleashed by making it any *more* complicated - you'd have to be pretty sure the benefits outweighed the risks!
(In reply to comment #11) > Well, to link to an article about '[' now, you could type "[[left bracket]]" -so what would we have gained? I don't expect one would ever want to link to [. But it would be a useful redirect for the go/search box, for anyone who didn't know it was called "bracket". But that specific page isn't really the issue, anyway. How about a music album that uses [ and ] in the title? Or a book title with { and }? Do you want to personally guarantee that no one will ever have a legitimate use for any of these characters? > And the fact that such inconsistencies *already* exist should demonstrate just how much trouble would be unleashed by making it any *more* complicated -you'd have to be pretty sure the benefits outweighed the risks! That's why it's so lucky that this bug can be fixed (I think) without touching the parser at all!
A related issue (which I won't bother listing as a separate bug, since it will merely be resolved to "wontfix" regardless) involves % in page titles. For example, [[%2542]] doesn't produce a link when parsed. (Presumably it should link to the page entitled "%42", since MediaWiki strangely supports [[percent-encoding]] in wiki links.) Note that the URL http://en.wikipedia.org/wiki/%2542 ---the percent-encoded URL for "%42"---returns [[Bad title]].
Such titles are forbidden because they can't be round-tripped -- when written in wikitext the chars are decoded and the original page becomes inaccessible.
Thanks for the explanation; that kind of makes sense. Note that [[''foo'']] can't be round-tripped, either---to get the parser to link to that page, you have to use something like [[<nowiki>''foo''</nowiki>]]. It seems to me that if people really need a page named %42 (album title, perhaps?), they'll be willing to learn how to link to it. The cleanest solution would be if we could turn off percent-encoding in wiki syntax. (I know, I know, we can't do that because people insist on copying URLs instead of page titles into wiki text.) Alternatively, perhaps [[<nowiki>%2542</nowiki>]] should work like the ''foo'' example above.