Last modified: 2011-03-13 18:06:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T4908, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 2908 - Can't have <, >, {, or } in page titles
Can't have <, >, {, or } in page titles
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.5.x
All All
: Lowest minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-19 17:23 UTC by David Benbennick
Modified: 2011-03-13 18:06 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Trivial edit to Title.php:legalChars (603 bytes, patch)
2005-08-22 22:00 UTC, David Benbennick
Details

Description David Benbennick 2005-07-19 17:23:00 UTC
MediaWiki doesn't allow < or > in page titles.  Perhaps it has 
something to do with shell interpretation.  But I don't see why this 
should be a technical limitation.  File names in Unix can have < and 
>, after all.  You just have to be very careful about properly 
quoting.
Comment 1 Antoine "hashar" Musso (WMF) 2005-08-18 11:36:19 UTC
Quoting RFC 1738 http://www.faqs.org/rfcs/rfc1738.html :

The characters "<" and ">" are unsafe because they are used
as the delimiters around URLs in free text;

They could be escaped to %3C and %3E though.
Comment 2 David Benbennick 2005-08-18 12:35:21 UTC
The same paragraph of rfc1738 lists ^ as also being unsafe, yet we have [[^]] as
a redirect to [[circumflex]].  (Also [["]] redirects to [[quotation mark]],
[[%]] to [[percentage]], [[\]] to [[backslash]], [[~]] to [[tilde]], and [[`]]
to [[grave accent]].)

The only characters listed as unsafe in that RFC that we don't allow in page
titles are <, >, [, ], {, }, and |.  The |, [, and ] are because of wiki-syntax
limitations.  <, >, {, and }, though, should be allowed in page titles, possibly
via %-escaping.
Comment 3 Rowan Collins [IMSoP] 2005-08-18 16:09:53 UTC
Well, it seems sensible to disallow '{' and '}' for the same reason as '[' and
']' - how would you include a page called "}"? Sure, we could make the user
escape them by hand, but that's arguably more ugly than just making them choose
a different name, and an invitation for bugs to come and nest in our code.

'<' and '>', meanwhile, have the potential to generate malformed output which
includes unfiltered HTML tags. Obviously, this is perfectly avoidable, and
wouldn't require any mangling by users, but we would have to be very careful to
get this right, and the benefits (slightly nicer titles) may not outweigh the
risks, and the effort required to avoid them. 

Just my €0.02, of course...
Comment 4 Brion Vibber 2005-08-19 09:15:34 UTC
{ and } are markup used in links and cannot ever be part of page titles for this 
reason.

< and > are disallowed for safety.
Comment 5 David Benbennick 2005-08-22 22:00:08 UTC
Created attachment 825 [details]
Trivial edit to Title.php:legalChars

I see no reason why [[}]] shouldn't link to the page called }.	Similarly,
[[a]b]] should link to the page called a]b.

I don't know about the safety of < and > (not having audited the entire code!),
but I would think that &, ;, and ! are just as dangerous.

It appears that Parser.php is written in such a way that it can handle all
those characters ([]{}<>) without modification.  All that is necessary is to
edit Title.php:legalChars in the obvious way (see the patch).  Then all of the
following Wiki code works as you'd expect

[[]]] links to ]
[[a]b]] links to a]b
[[a]]]] links to a]]
[[}}]] links to }}
{{}}}} includes Template:}}
[[>]] links to >

And yes, having a redirect on Wikipedia from [ and ] to Bracket would be
useful.
Comment 6 Brion Vibber 2005-08-22 23:18:48 UTC
If you think about this for half a second you'll see why that doesn't work:
[[This is a]] long page title which is still in the link [[and why not?]]

&, ;, and ! are not dangerous in any way. ; and ! have no special meaning at all, and & 
is merely annoying if output incorrectly (invalid (X)HTML or unexpected character 
entity).
Comment 7 Rowan Collins [IMSoP] 2005-08-22 23:32:54 UTC
(In reply to comment #5)
> following Wiki code works as you'd expect
> 
> [[]]] links to ]
> [[a]b]] links to a]b
> [[a]]]] links to a]]
> [[}}]] links to }}
> {{}}}} includes Template:}}
> [[>]] links to >

But *are* these always the expected behaviours? 
* What about using a template or template parameter to determine what title to
use (e.g. "[[Wikiquote:{{PAGENAME}}|{{PAGENAME}}]]" or "[[{{{1}}}{{{month}}}
1{{{4}}}|1]]" or "{{SeptemberCalendar{{CURRENTYEAR}}}}"; all real examples)?
* And what about images with links in their caption? - e.g.
"[[Image:Foo.jpeg|thumb|this is a [[photo]] of [[foo]]]]"; since your patch also
allows "[" in titles, this syntax is extremely ambiguous.
* Or even just a mix of links and punctuation, like "[See [[foo]]]"

While some of these things appear to still work with your patch, because of the
order things are processed in the existing code, making them *reliably* do so
would be a nightmare.
Comment 8 David Benbennick 2005-08-23 17:03:46 UTC
> But *are* these always the expected behaviours? 
> * What about using a template or template parameter to determine what title to
> use (e.g. "[[Wikiquote:{{PAGENAME}}|{{PAGENAME}}]]" or "[[{{{1}}}{{{month}}}
> 1{{{4}}}|1]]" or "{{SeptemberCalendar{{CURRENTYEAR}}}}"; all real examples)?
> * And what about images with links in their caption? - e.g.
> "[[Image:Foo.jpeg|thumb|this is a [[photo]] of [[foo]]]]"; since your patch also
> allows "[" in titles, this syntax is extremely ambiguous.
> * Or even just a mix of links and punctuation, like "[See [[foo]]]"

Thanks for the constructive comments, Rowan!  That's a good point, "[See
[[foo]]]" no longer works the same way.  (Your other examples do still work.)

Note that MediaWiki is currently a bit inconsistent.  "[See [[foo]]]" displays
as "[See <a>foo</a>]", whereas "[See [[foo|bar]]]" displays as "[See
<a>bar]</a>".  Also, the alt text of "[[Image:Barnstar.png|[[foo|bar]]] hey
[[foo|bar]]]]]" is "bar] hey bar"---the second "[[foo|bar]]]" is interpreted
differently.  

The patch exacerbates this inconsistency.  Perhaps Parser.php should be changed
so that links end at the ''beginning'' of the first string of two or more ],
instead of at the end.  Then "[See [[foo|bar]]]" would display as "[See
<a>bar</a>]".

I understand now what Brion was referring to above about < and > being unsafe. 
Check out

1. http://en.wikipedia.org/wiki/&lt;  (which doesn't exist, but is moderately
broken)

2. http://en.wikipedia.org/wiki/Special:Movepage/User:Dbenbenn/%26lt%3B  (the
"Move page:" field displays wrong)

Thus, even without < and > in titles, it's important to escape characters correctly.
Comment 9 David Benbennick 2005-08-23 17:06:21 UTC
> 1. http://en.wikipedia.org/wiki/&lt;  (which doesn't exist, but is moderately
> broken)

Oops, the link above wasn't parsed correctly.  Try
http://en.wikipedia.org/wiki/%26lt%3B instead.
Comment 10 David Benbennick 2005-08-24 14:18:22 UTC
(In reply to comment #7)
> because of the order things are processed in the existing code,
> making them *reliably* do so would be a nightmare.

Perhaps Rowan is right.  For example, currently "[[a [[test]]" links 
to "test", whereas with the patch it would link to "a [[test".  It's 
somewhat evil to break existing pages.

Fortunately, it isn't necessary!  You can link to "A" with [[a]] or 
[[&#97;]].  Similarly, to link to [ with the current parser syntax, 
you'd expect to use "[[&#91;]]".

That doesn't actually work.  The reason is that the notions 
of "characters that can go within a wiki link" and "characters that 
can be in a page title" are conflated---they're both defined by 
Title.php:legalChars.  If we separate the two concepts, then we can 
allow page titles with [, ], {, }, and |, without having to modify 
the parser at all.  (And once the safety issues are worked out, we 
can allow < and > too.)

By the way, see bug 3243 for a list of at least 14 places where & 
isn't correctly HTML-sanitized.
Comment 11 Rowan Collins [IMSoP] 2005-08-26 22:12:39 UTC
(In reply to comment #10)
> Fortunately, it isn't necessary!  You can link to "A" with [[a]] or 
> [[&#97;]].  Similarly, to link to [ with the current parser syntax, 
> you'd expect to use "[[&#91;]]".

Well, to link to an article about '[' now, you could type "[[left bracket]]" -
so what would we have gained? OK, the page might look a bit nicer when you get
there (although a heading of just '[' might look weird anyway, so you'd redirect
to something more verbose; in which case, it amounts to being able to type
[[&#91;]] and get redirected to [[left bracket]] anyway!), but this kind of
change is frankly a lot of headaches for a very small improvement in the actual
software.

As for your comments about existing inconsistencies, some of those may well be
considered bugs - the "parser", so called, is widely considered extremely ugly,
and is "designed" (i.e. hacked together) to work mostly as expected, most of the
time. And the fact that such inconsistencies *already* exist should demonstrate
just how much trouble would be unleashed by making it any *more* complicated -
you'd have to be pretty sure the benefits outweighed the risks!
Comment 12 David Benbennick 2005-08-26 22:40:07 UTC
(In reply to comment #11)
> Well, to link to an article about '[' now, you could type "[[left 
bracket]]" -so what would we have gained?

I don't expect one would ever want to link to [.  But it would be a useful 
redirect for the go/search box, for anyone who didn't know it was 
called "bracket".

But that specific page isn't really the issue, anyway.  How about a music 
album that uses [ and ] in the title?  Or a book title with { and }?  Do 
you want to personally guarantee that no one will ever have a legitimate 
use for any of these characters?

> And the fact that such inconsistencies *already* exist should 
demonstrate just how much trouble would be unleashed by making it any 
*more* complicated -you'd have to be pretty sure the benefits outweighed 
the risks!

That's why it's so lucky that this bug can be fixed (I think) without 
touching the parser at all!
Comment 13 David Benbennick 2005-09-24 01:56:33 UTC
A related issue (which I won't bother listing as a separate bug, since it will
merely be resolved to "wontfix" regardless) involves % in page titles.  For
example, [[%2542]] doesn't produce a link when parsed.  (Presumably it should
link to the page entitled "%42", since MediaWiki strangely supports
[[percent-encoding]] in wiki links.)  Note that the URL

http://en.wikipedia.org/wiki/%2542

---the percent-encoded URL for "%42"---returns [[Bad title]].
Comment 14 Brion Vibber 2005-09-24 01:58:52 UTC
Such titles are forbidden because they can't be round-tripped -- when written in 
wikitext the chars are decoded and the original page becomes inaccessible.
Comment 15 David Benbennick 2005-09-24 02:13:12 UTC
Thanks for the explanation; that kind of makes sense.  Note that [[''foo'']]
can't be round-tripped, either---to get the parser to link to that page, you
have to use something like [[<nowiki>''foo''</nowiki>]].

It seems to me that if people really need a page named %42 (album title,
perhaps?), they'll be willing to learn how to link to it.

The cleanest solution would be if we could turn off percent-encoding in wiki
syntax. (I know, I know, we can't do that because people insist on copying URLs
instead of page titles into wiki text.)  Alternatively, perhaps
[[<nowiki>%2542</nowiki>]] should work like the ''foo'' example above.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links