Last modified: 2014-05-13 09:31:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64909, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62909 - Loosen GWToolset's file name restrictions (parentheses, apostrophes, ampersands, etc.)
Loosen GWToolset's file name restrictions (parentheses, apostrophes, ampersan...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
GWToolset (Other open bugs)
unspecified
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: easy
: 64843 (view as bug list)
Depends on:
Blocks: hackathon2014
  Show dependency treegraph
 
Reported: 2014-03-21 02:17 UTC by Bawolff (Brian Wolff)
Modified: 2014-05-13 09:31 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bawolff (Brian Wolff) 2014-03-21 02:17:08 UTC
GWToolset is uploading a lot of file names like https://commons.wikimedia.org/w/index.php?title=File:The_King_of_Hungary_holding_council_in_his_tent_on_the_battlefield_-_Froissart--39-s_Chronicles_-Volume_IV-_part_2-_-1470-1475--_f.84_-_BL_Harley_MS_4380.jpg&redirect=no . The proper name is [[commons:File:The King of Hungary holding council in his tent on the battlefield - Froissart's Chronicles (Volume IV, part 2) (1470-1475), f.84 - BL Harley MS 4380.jpg]] (and it has since been renamed to that)

Notice how things like ), (, ', & are being stripped and replaced with '-'. This is wrong, those characters are perfectly valid in a title.

Even worse, characters like apostraphe (') are being converted to their html entity "'", with &, # and ; being replaced with dashes, resulting in "--39-". This is wrong, as html entities in titles should be converted to the character they represent, and that character should be dealt with as appropriate (As is done in normal titles)


See: https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2014/03#Renaming_multiple_files.3F
Comment 2 Bawolff (Brian Wolff) 2014-03-21 05:40:00 UTC
(In reply to MZMcBride from comment #1)
> Related:
> http://lists.wikimedia.org/pipermail/glamtools/2014-March/000035.html

And from the email:
>'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'


Many of these characters are very common in file names (apostraphes, parenthesis) and absolutely allowed both socially and technically.

I think that GWToolset should simply follow $wgIllegalFileChars and the things that Title::secureAndSplit blocks (To be specific, only blacklist '#','<','>','[',']','|','{','}', and ':'). If there really is a need for additional characters being blacklisted for social reasons (I'm not convinced there is), then the black list should be configurable on wiki as mediawiki: namespace message, since social conventions change over time.
Comment 3 Bawolff (Brian Wolff) 2014-03-26 14:01:37 UTC
Sorry, to be more specific (because I got questions), GWToolset should use the built in function wfStripIllegalFilenameChars instead of trying to re-implement title validation rules in Utils::stripIllegalTitleChars.

This bug is also about html entities, so the full process for normalizing the title should be:

1) Run through Sanitizer::decodeCharReferences()
2) Run through wfStripIllegalFilenameChars()
Comment 4 dan 2014-03-26 14:47:22 UTC
* working on a patch
Comment 5 Gerrit Notification Bot 2014-03-26 15:54:10 UTC
Change 121094 had a related patch set uploaded by Dan-nl:
relax wiki title restrictions

https://gerrit.wikimedia.org/r/121094
Comment 6 Gerrit Notification Bot 2014-04-01 23:35:42 UTC
Change 121094 merged by jenkins-bot:
relax wiki title restrictions

https://gerrit.wikimedia.org/r/121094
Comment 7 Bawolff (Brian Wolff) 2014-04-01 23:40:24 UTC
The fix for this issue is scheduled to be deployed on commons on Tuesday, 8 April 2014
Comment 8 Gerrit Notification Bot 2014-04-11 14:18:58 UTC
Change 125401 had a related patch set uploaded by Dan-nl:
wfStripIllegalFilenameChars truncates title

https://gerrit.wikimedia.org/r/125401
Comment 9 Gerrit Notification Bot 2014-04-14 19:57:33 UTC
Change 125401 merged by jenkins-bot:
wfStripIllegalFilenameChars truncates title

https://gerrit.wikimedia.org/r/125401
Comment 10 Kelson [Emmanuel Engelhart] 2014-04-17 14:59:06 UTC
Things seem to me to work better now, but transforming auto. titles with illegal characters is IMO not a good approach. The reason is that there is no way to track these files (after transformation). Why not simply checking this just after the XML upload and telling that something is wrong with the titles (and listing them)?
Comment 11 Jean-Fred 2014-05-09 14:01:37 UTC
(In reply to Kelson [Emmanuel Engelhart] from comment #10)
> Things seem to me to work better now, but transforming auto. titles with
> illegal characters is IMO not a good approach. The reason is that there is
> no way to track these files (after transformation). Why not simply checking
> this just after the XML upload and telling that something is wrong with the
> titles (and listing them)?

This is moved to bug 65070. Marking as closed.
Comment 12 dan 2014-05-13 09:31:50 UTC
*** Bug 64843 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links