Last modified: 2014-10-17 21:53:55 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18474, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16474 - {{FILEPATH:{{PAGENAME}} }} doesn't work for filenames containing characters that get escaped to HTML entities
{{FILEPATH:{{PAGENAME}} }} doesn't work for filenames containing characters t...
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Templates (Other open bugs)
unspecified
All All
: Low major with 7 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://commons.wikimedia.org/wiki/Ima...
:
: 23253 24938 (view as bug list)
Depends on: 22880 14779 35746
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-27 19:27 UTC by Skyluke
Modified: 2014-10-17 21:53 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Skyluke 2008-11-27 19:27:39 UTC
The command {{FILEPATH:{{PAGENAME}} }} in page with non-ASCII characters in name doesn't work.
Comment 1 Splarka 2008-11-28 05:48:42 UTC
It isn't unicode, which works fine, it happens with the following characters (which are allowed in page titles): & " ' which, in {{PAGENAME}} and related magic words, get escaped to their HTML entities before parser function expansion: " & ' ... renaming but to reflect this.

A workaround is to use [[Special:Filepath/{{PAGENAME}}]].
 http://en.wikipedia.org/wiki/Special:ExpandTemplates?contexttitle=Image%3AAci_Sant%27Antonio.svg&input=%23-+%7B%7BPAGENAME%7D%7D%0D%0A%23-+%7B%7Bfilepath%3A%7B%7BPAGENAMEE%7D%7D%7D%7D%0D%0A%23-+%5B%5BSpecial%3AFilepath%2F%7B%7BPAGENAME%7D%7D%5D%5D%0D%0A&removecomments=1&generate_xml=1

This might be a dupe of an existing magic word/entity bug?
Comment 2 Skyluke 2008-11-28 09:41:04 UTC
Workaround can't be used in template like http://commons.wikimedia.org/wiki/Template:ValidSVG, because it doesn't work well inside "[" "]" url address.
Comment 3 Splarka 2008-11-28 18:41:52 UTC
That workaround can still be used, you just have to use urlencode and fullurl (admittedly uglier URLs though).
 http://commons.wikimedia.org/w/index.php?title=Template:ValidSVG&diff=16376628&oldid=15911588

That seems to work, at least temporarily until {{PAGENAME}} gets fixed?
Comment 4 Brion Vibber 2008-12-18 19:51:54 UTC
Looks like the same base problem as bug 14779...
Comment 5 Rich Farmbrough 2009-10-02 18:28:07 UTC
Also seems to apply to the output of other magic Words, even simple ones like LC.  We risk getting into the situation where people are relying on the broken behaviour, and fixing it will then break stuff.  
Comment 6 User:Docu 2010-03-11 07:19:13 UTC
Related issue:

{{PAGESINCATEGORY:{{PAGENAME}}}} doesn't work on categories with a '


Sample:

http://commons.wikimedia.org/w/index.php?title=Category:Kao_Ch%27i-p%27ei&oldid=36348147


Discussion on Commons Village Pump:

http://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&oldid=36347737#Unexplainable_behavior_of_.7B.7B.23ifeq:_.7D.7D
Comment 7 Umherirrender 2010-04-21 18:23:10 UTC
*** Bug 23253 has been marked as a duplicate of this bug. ***
Comment 8 db [inactive,noenotif] 2010-08-30 19:46:49 UTC
*** Bug 24938 has been marked as a duplicate of this bug. ***
Comment 9 Bergi 2010-10-15 16:23:48 UTC
I'm not sure whether it was already posted, but 
{{NAMESPACE:xyz}} and
{{PAGENAME:xyz}}
are also not working with "<", ">" or their html-escaped equivalents &amp;lt; and &amp;gt;. The quote characters (",') are OK in here.
I think it belongs to this bug, please fix me.
Comment 10 Umherirrender 2010-10-26 19:21:15 UTC
(In reply to comment #9)
> I'm not sure whether it was already posted, but 
> {{NAMESPACE:xyz}} and
> {{PAGENAME:xyz}}
> are also not working with "<", ">" or their html-escaped equivalents &amp;lt;
> and &amp;gt;. The quote characters (",') are OK in here.
> I think it belongs to this bug, please fix me.

"<" and ">" are not allow inside titles, see [[mw:Manual:$wgLegalTitleChars]], so there are cannot work with NAMESPACE or PAGENAME.
Comment 11 Umherirrender 2010-10-26 19:35:05 UTC
(In reply to comment #4)
> Looks like the same base problem as bug 14779...

No, using "$title = Title::newFromText( $name );" and give the $title to wfFindFile works (with or without "File:"-prefix). But that doesnot work with urlencode title, that is bug 14779.

wfFindFile is using Title::makeTitleSafe, which use Tilte::makeTitle for the mDbkeyform. Title::newFromText use Sanitizer::decodeCharReferencesAndNormalize and that normalize the &quot; &amp; &#39;.

It looks like, newFromText should used for all Titles from Wikitext and makeTitleSafe should used for form given titles. So this parser function use the wrong method.
Comment 12 Umherirrender 2010-10-26 19:37:44 UTC
(In reply to comment #6)
> Related issue:
> {{PAGESINCATEGORY:{{PAGENAME}}}} doesn't work on categories with a '

PAGESINCATEGORY used Category::newFromName which use Title::makeTitleSafe -> see comment 11
Comment 13 Umherirrender 2010-10-27 13:20:08 UTC
(In reply to comment #11)
> ... ([wfFindFile works] with or without "File:"-prefix). ...

reported as bug 25670
Comment 14 Mark A. Hershberger 2012-04-04 15:09:27 UTC
See also bug 35628
Comment 15 Philippe Verdy 2014-02-15 08:35:43 UTC
Workaround:

* filter the fiven file name / page name / category name, containing HTML
  entities as returned by various parser functions
      (like lc:, uc:, #if:, #switch:...),
  through #titleparts to convert back these HTML entities to plain characters

* this returned value can be passed to parsers functions that do not like
  these HTML entities:
      PAGESINCATEGORY, FILEPATH, #ifexist...


The HTML entities we need to handle are notably those characters:

  ' " &

which are valid in page names (the < and > characters are not valid in
pagenames, they will remain encoded after calling #titleparts).


For details about te various encodings used in page names, see

  [[mw:Manual:PAGENAMEE encoding]]

which details how characters may get encoded.
This covers the full ASCII set, and the first printable non-ASCII characters (tested with UTF-8 assumed for their plain-text encoding).
This also covers some other contextual changes that may occur for some characters which are not encoded except in leading positions where they may be changed, or dropped, as well as those few charaters that get transformed within specific subsequences anywhere in the string (such as the slash and periods).

But I agree that functions like PAGESINCATEGORY, FILEPATH... should properly decode these HTML entities (and notably the 3 characters above; the most frequent one encountered being the ASCII apostrophe-quote).
Comment 16 Gerrit Notification Bot 2014-07-11 23:48:21 UTC
Change 145724 had a related patch set uploaded by Brian Wolff:
Have Title::makeTitleSafe decode html entities.

https://gerrit.wikimedia.org/r/145724

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links