Last modified: 2010-07-04 17:19:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 12998 - Weaken DISPLAYTITLE restictions
Weaken DISPLAYTITLE restictions
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Brion Vibber
: 496 13545 14226 (view as bug list)
Depends on:
Blocks: 13639
  Show dependency treegraph
Reported: 2008-02-12 17:58 UTC by Kalan
Modified: 2010-07-04 17:19 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Proposed patch v1 (6.55 KB, patch)
2008-10-13 23:36 UTC, rememberthedot
Proposed patch v2 (7.37 KB, patch)
2008-12-05 07:22 UTC, rememberthedot
Proposed patch v3 (7.01 KB, patch)
2008-12-06 05:44 UTC, rememberthedot
Proposed patch v4 (6.76 KB, patch)
2008-12-11 23:10 UTC, rememberthedot
Proposed patch v5 (5.21 KB, patch)
2009-02-27 05:37 UTC, rememberthedot

Description Kalan 2008-02-12 17:58:10 UTC
At the moment, we have a {{DISPLAYTITLE}} magic word that allows us to modify the displayed page title the way we want, but with several limitations that come to a check whether a DISPLAYTITLE'd title can be normalized to the actual page title. But sometimes this is not enough (for example, it is inapplicable for titles that should have unallowed characters like | or # in them). Or there might be a need [1] to display some formatting. And not to mention all these user pages that use ugly hacks for changing their title to something different to dull User:XXX.

Present solution for this in some wikis is different kinds of JavaScript that, besides their weird and tricky implementation, have compatibility problems with skins. My proposal is to:

1. Leave everything as it is if DISPLAYTITLE argument normalizes to the current page title.
2. Make behavior the same for DISPLAYTITLE arguments that normalize to the current page title after formatting has been stripped (maybe the same algorithm that is used for TOCs?)
3. Display a small irremovable sub (something like "The internal title of this page is {{PAGENAME}}") if two conditions above have not been matched.

Comment 1 Daniel Friesen 2008-02-14 04:43:53 UTC
I'd agree... There is also another case to consider.

I've seen a few wiki which are using the Title hack because they need to have some of the title italicized, such as in which "Venator" is italicized in the title.

Of course, if someone goes and uses {{DISPLAYTITLE:''Venator''-class Star Destroyer}} inside of the page, it is just ignored.

However, after minimal parsing (only lightweight things like emphasis, perhaps css classes and style tags as something to be added additionally) the resulting title is actually still valid, while in markup you may see [[''Venator''-class Star Destroyer]] after parsed the text is actually [[Venator-class Star Destroyer]] which is still a valid link to the same article.
((Ohwait, that's your 2))

Though I don't know about irremovable. Stuck there strongly by default yes... However there are some private wiki which don't feel a need for that and can deal with that themselves.

There's actually another case to be thought of, redirects from other text formats which would actually be more valid, but have url issues.

While not the case on the English Wikipedia (Though it might be for other things, couldn't find info on it) titles like "MÄR" are used for the actual title. However the actual url rendering of this is "M%C3%84R". There are other wiki which may likely enforce use of the title "MAR" instead to avoid that issue. However while the wiki does prefer the use of "MAR" in the url, they wish to have "MÄR" in the title. Not something they can do, but they at least redirect the "MÄR" article to "MAR".
An addition may be to allow pages which redirect to the page to be allowed as titles. So one could place a redirect to "MAR" at "MÄR",  and then use {{DISPLAYTITLE:MÄR}} on the "MAR" article. As a result, the url will display "MAR" and be address readable by the user, at the same time as displaying "MÄR" as the title of the article.
Comment 2 Brion Vibber 2008-03-28 20:02:04 UTC
*** Bug 13545 has been marked as a duplicate of this bug. ***
Comment 3 Daniel Friesen 2008-03-29 21:30:57 UTC
I thought I would have already noted it. But In the title rewrite I am working on (The one using the real title field to allow storing the non-normalized title in pair with the normalized one so we can save [[_main_Page]] and have it display that even though the title is the same as [[Main Page]]) I also intend to do some work on DISPLAYTITLE.

One of the two primary focuses on the title rewrite is improving extensibility of the title system. The length that the titles themselves are going to be extended into will actually void out the need to use DISPLAYTITLE for the current reasons it exists. However because it's currently in use, I didn't want to break compatibility by removing it. So I intend to change it.

I am going to be changing DISPLAYTITLE from a big parser hack, into a separate system for manipulating the title displayed in the page (Do note though, that changing that title will no longer change the title in the address bar, the title in the address bar will be the real title, non-normalized)

This will have a few effects to it.
Extensions will now be able to mess with the Display Title. So you could actually create an extension to change Food/Fruit/Apple/Spartan into Food > Fruit > Apple > Spartan where everything except Spartan is a link to the other page... Which could turn subpage structure, into a directory type structure. Or have an extension change Template:Foo show as {{Foo}} in the title bar instead.
Additionally while the DISPLAYTITLE magic word will default to working nearly exactly the same way that it works now (sans displaying in the Browser Title though), what it does will become extensible. So yes, if you decide you want some markup to become valid in the title, then by making use of an extension to the Display Title system you can modify DISPLAYTITLE's behavior and make that markup valid and display inside the title header.
Comment 4 Daniel Kinzler 2008-08-17 21:15:26 UTC
In r39552 i have added the option $wgRestrictDisplayTitle which lets you disable all restrictions on the displaytitle. This is not primarily intended for use on mediawiki sites, but caters to popular demand on small private wikis. It just seemed silly to insist on the hard coded restrictions.

So, let's see if it gets reverted or stays :)
Comment 5 rememberthedot 2008-10-13 23:36:34 UTC
Created attachment 5431 [details]
Proposed patch v1

Here is a preliminary patch that should help resolve the problem. It uses
Sanitizer::removeHTMLtags, so it allows tags allowed in wikitext (like <sup>
and <sub>) but not tags not allowed in wikitext (like <script>). This is very
similar to what the English Wikipedia's JavaScript implementation already does
(see [[MediaWiki:Common.js]]). I tested this patch on all skins and it appears
to work OK.

Unlike the previous patch, this patch differentiates between the HTML title
(what will go into <h1>) and the plain text title (what will go into <title>).
This avoids problems with tags finding their way into <title> when <title> is
not supposed to have any tags inside of it.

One of the limitations of this patch is that it doesn't process templates. It'd
be nice if we could say {{DISPLAYTITLE:{{Unicode|unusual characters}}}},
including a template designed to improve browser compatibility with unusual
characters. But this is a minor concern since I believe all the compatibility
templates like this can be expressed as <span class="Unicode"> instead.

And of course, if nobody finds any major bugs with this patch, we could just
implement it for now and worry about tweaking the code to be more permissive
Comment 6 rememberthedot 2008-10-13 23:36:56 UTC
*** Bug 14226 has been marked as a duplicate of this bug. ***
Comment 7 Brion Vibber 2008-10-27 18:33:41 UTC
I'm a little worried about this:

+		$titleText = trim(DOMDocument::loadXML('<title>' . $titleHTML . '</title>')->textContent);

a) how does it perform and

b) will it trigger a fatal error if there's an imbalanced tag that the sanitizer misses?

Further, is it really necessary? How does it compare to Sanitizer::stripAllTags() ?

I would also recommend adding some kind of test suite containing examples of titles that should and shouldn't make it through the checks, and what the result is.
Comment 8 Daniel Friesen 2008-10-27 19:48:50 UTC
Don't we already have a sanitizer function for removing dom tags?
Comment 9 rememberthedot 2008-12-05 07:22:31 UTC
Created attachment 5562 [details]
Proposed patch v2

Revised patch to use Sanitizer::stripAllTags, which works great! Now, my goal in coding this patch was to completely eliminate the need for JavaScript hacks to set the values of the <title> and <h1> elements. The <h1> element _must_ be copy-pasteable (in other words, when you copy the <h1> text you should be able to make a link to the article just by pasting what you have). However, we permit non-normalizing titles in the <title> element because the user can't easily select the contents of <title> to copy it.

We have some pretty weird titles on the English Wikipedia. It wouldn't be unreasonable to have an article about something that had both a special character and a superscript, say "Abc#d<sup>e</sup>f". In order to be as accurate as possible, the DISPLAYTITLE magic word needs to be able to put "Abc#def" in <title> and "Abcd<sup>e</sup>f" in <h1>. So, I've updated the DISPLAYTITLE word to take two parameters. The syntax is now {{DISPLAYTITLE:requestedDisplayTitleH1|requestedDisplayTitleTitle}}. If requestedDisplayTitleTitle is not specified then a stripped version of requestedDisplayTitleH1 is used in <title> instead.

I ran into problems trying to make an automated test suite for this, however I did test it. Here are the major tests that I did manually on Main Page:

{{DISPLAYTITLE:<span style="text-decoration:underline">Main Page</span>}}
<title>Main Page - {{SITENAME}}</title>
<h1><span style="text-decoration:underline">Main Page</span> - {{SITENAME}}</h1>

{{DISPLAYTITLE:<i>Main Page}}
<title>Main Page - {{SITENAME}}</title>
<h1><i>Main Page</i> - {{SITENAME}}</h1>

<title>Main#Page - {{SITENAME}}</title>, <h1>Main Page - {{SITENAME}}</h1>

{{DISPLAYTITLE:<script>Main Page</script>}}
<title>&lt;script&gt;Main Page&lt;/script&gt; - {{SITENAME}}</title>
<h1>Main Page - {{SITENAME}}</h1>

<title>#Main_Page - {{SITENAME}}</title>

<title>#&lt;script&gt;Main_Page&lt;/script&gt; - {{SITENAME}}</title>

So, what do you think? What else needs to be done before this can go into MediaWiki?
Comment 10 Aaron Schulz 2008-12-06 01:23:53 UTC
OK, some quick notes:

Renaming 'setDisplayTitle' breaks any extensions that may use it. I'd keep it as it was.

Not sure I like the naming of '$requestedDisplayTitleH1'. Why does it have 'request'?

'MediaWiki escapes this automatically before it is seved out' should be *serves
Comment 11 rememberthedot 2008-12-06 05:44:54 UTC
Created attachment 5564 [details]
Proposed patch v3

I've made the requested changes, what do you think?
Comment 12 Aaron Schulz 2008-12-06 18:00:29 UTC
Done in r44271
Comment 13 rememberthedot 2008-12-06 20:06:50 UTC
Wow, that was fast, thank you!
Comment 14 Brion Vibber 2008-12-10 23:21:41 UTC
Behavior seems a bit hard to predict, as far as what's going to go in the header and what in the browser window etc. Pulling it back for further testing and discussion.

Reverted in r44432
Comment 15 rememberthedot 2008-12-11 01:17:01 UTC
Could you be any more specific about what's wrong? I can't fix it if I don't know what the problem is.
Comment 16 Brion Vibber 2008-12-11 19:22:05 UTC
Basic problem on two minutes of testing was that it seemed difficult to tell what was going to happen. Sometimes I'd see some pretty formatting in the <h1>, but I'd see a bunch of ugly markup in the <title>, or vice-versa.

This seems consistent with the test cases listed in comment #9, but I'd have to say those are pretty undesirable... If I see "Main Page" in the <h1> I should see "Main Page" in the <title> too.
Comment 17 rememberthedot 2008-12-11 19:53:34 UTC
I think I see what you're saying. Here are some actual article titles that we have to accommodate and what the <title> and <h1> ought to read for each:



Dweeb (band)
<title>[dweeb] (band)</title>
<h1>dweeb (band)</h1>

E=MC2 (song)
<title>E=MC² (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>

Signaling System 7
<title>Signaling System #7</title>
<h1>Signaling System 7</h1>

So you see, we cannot guarantee a 1:1 relationship between the <h1> and the <title>. Nevertheless, I still don't know exactly what problems you ran into - could you please post the specific calls to DISPLAYTITLE that produced undesirable behavior for you?
Comment 18 Brion Vibber 2008-12-11 21:03:38 UTC
All of the above examples are examples of problems. There's no clear reason for them to be different in the described ways. Certainly it makes no sense for the <title> portion, which must be plaintext, to have additional markup or characters that are not in the <h1>, which is much less restrictive by being HTML.
Comment 19 rememberthedot 2008-12-11 22:25:55 UTC
Thanks for your reply. <h1> is copy-pasteable, <title> is not. The user expects to be able to copy the contents of <h1> and paste them to make a link to the article, whereas the user cannot select the contents of the title bar to copy it. This is why the <h1> is more restrictive than the <title>. Does that at least make sense?
Comment 20 Brion Vibber 2008-12-11 22:30:26 UTC
IMHO if it's not in the <h1> it shouldn't be in the <title>.

The mixed approach has some problems; since the <title> contents are less obviously visible, most of the time nobody will notice the "fancier" characters in the title bar. At a minimum, I think people will be much more interested in making the <h1> look nice, and don't have as much interest in the <title>.

At worst it means that a bogus attempt to use DISPLAYTITLE will not have any affect on the very visible <h1> while dumping broken ugly markup into the <title>, which may be forgotten and left around.

I'd prefer to avoid "multiplying elements" per Occam's razor... :)
Comment 21 rememberthedot 2008-12-11 23:10:59 UTC
Created attachment 5573 [details]
Proposed patch v4

Dropped support for freeform <title>s. I see your point, the inconsistency could very well confuse the end user. A few test cases for the new patch:

Dweeb (band)
{{DISPLAYTITLE:[dweeb] (band)}}
<title>Dweeb (band)</title>
<h1>Dweeb (band)</h1>

Dweeb (band)
{{DISPLAYTITLE:dweeb (band)}}
<title>dweeb (band)</title>
<h1>dweeb (band)</h1>

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)|E=MC² (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC2 (song)</h1>

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>

Signaling System 7
{{DISPLAYTITLE:Signaling System #7}}
<title>Signaling System 7</title>
<h1>Signaling System 7</h1>
Comment 22 rememberthedot 2008-12-17 19:51:25 UTC
Ah sorry, the third test case should have been:

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)|E=MC² (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>
Comment 23 Brion Vibber 2008-12-30 02:27:06 UTC
*** Bug 496 has been marked as a duplicate of this bug. ***
Comment 24 Aaron Schulz 2008-12-30 12:23:49 UTC
Done in r45181, assuming no other conceptual issues. Looks fine.
Comment 25 rememberthedot 2008-12-31 05:11:44 UTC
Thank you! Just reply back here if there are any more problems.
Comment 26 Aaron Schulz 2008-12-31 08:37:34 UTC
(In reply to comment #25)
> Thank you! Just reply back here if there are any more problems.

Comment 27 rememberthedot 2009-02-27 05:37:49 UTC
Created attachment 5867 [details]
Proposed patch v5

Much cleaner patch that makes OutputPage::setPageTitle escape bad tags like <script> but leave good ones like <i> in <h1>. setPageTitle then strips out all remaining tags and places the result into <title>. All CoreParserFunctions::displaytitle has to do is make sure that what will wind up in <title> is consistent with the actual page title.

I removed the highly unsafe $wgRestrictDisplayTitle "feature", since with loosened displaytitle restrictions this is unnecessary. As an added bonus, I also did some stylistic cleanup in GlobalFunctions.php.
Comment 28 Brion Vibber 2009-04-08 19:07:12 UTC
Assigning to myself for review.
Comment 29 rememberthedot 2009-04-09 05:17:35 UTC
Patch committed in r49330.
Comment 30 Melancholie 2009-06-01 11:50:32 UTC
Just a minor note:
It is possible to vandalize a title, using <span style="display: none;">...</span>
Test case (if page is "Shipment"):
{{DISPLAYTITLE:shi<span style="display:none">pmen</span>t}} makes *shit* out of "shipment" e.g. ;-)

Manipulating of the title's main text characters should either be forbidden at all or otherwise allowed completely.

Note You need to log in before you can comment on or make changes to this bug.