Last modified: 2014-09-23 23:33:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9389, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 7389 - Colon function for encoding url's in iso-8859-1
Colon function for encoding url's in iso-8859-1
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-09-23 08:37 UTC by Bård Dahlmo
Modified: 2014-09-23 23:33 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch to enable urlencodeiso (2.22 KB, patch)
2006-09-24 22:06 UTC, Bård Dahlmo
Details

Description Bård Dahlmo 2006-09-23 08:37:35 UTC
Some sites require that urls are encoded in iso-8859-1. To support generating
valid url's from templates, a new function is required.
The code to generate this would look like:

return urlencode( utf8_decode ( $s ));
Comment 1 Bård Dahlmo 2006-09-24 17:33:30 UTC
URL's to the external website ''finn.no'' and other requires that parameters are
encoded in iso-8859-1, not UTF-8.
A link to a map showing the city ''Ålesund'' might be
http://www.finn.no/finn/map?mapTitle=%C5lesund,+%C5LESUND,+By.
Note the Å is encoded as %C5, not %C3%85 as urlencode will return.

A new colon function, for example urlencode_iso8859-1, that encodes the
parameters as iso-8859-1 will solve this problem.

See also http://no.wikipedia.org/wiki/Wikipedia:Tegnsett
Comment 2 Bård Dahlmo 2006-09-24 22:06:18 UTC
Created attachment 2408 [details]
Patch to enable urlencodeiso
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-09-25 00:51:08 UTC
There's no way MediaWiki can ever account for every oddball URL encoding method
without implementing full-fledged string functions: just the other day I
encountered a site that somehow double-urlencoded its URLs, so there was a "25"
just before every byte's hex representation.  Heck, MediaWiki itself uses an
unusual encoding, converting spaces to _ instead of +.  What about ISO 8859-2
through -15, for instance?
Comment 4 nsaa 2006-09-25 12:40:04 UTC
ISO 8859-1 is "is the basis of two widely-used character maps known as
ISO-8859-1 (note the extra hyphen) and Windows-1252.

In June 2004, the ISO/IEC working group responsible for maintaining eight-bit
coded character sets disbanded and ceased all maintenance of ISO 8859, including
ISO 8859-1, in order to concentrate on the Universal Character Set and Unicode.
In computing applications, encodings that provide full UCS support (such as
UTF-8 and UTF-16) are finding increasing favor over encodings based on ISO
8859-1." from [[:en:/ISO/IEC_8859-1]]. So ISO-8859-1 is highly more relevant
than other kinds of encodings? ~~~~
Comment 5 Bård Dahlmo 2006-09-25 19:02:09 UTC
So a patch is rejected if it only solves one problem and leaves other problems
unsolved?
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-09-26 01:59:46 UTC
Nobody has rejected your patch; otherwise this would be marked WONTFIX.  I'm not
even a developer.  I'm just giving my opinion: we probably don't want 352 colon
functions.  That sounds suspiciously like feature creep.  If any developer
disagrees, which some may well, they'll commit the patch.  There's no point
discussing it here further either way.
Comment 7 p858snake 2011-04-30 00:09:43 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 8 Sumana Harihareswara 2011-11-09 03:06:02 UTC
+need-review to signal to developers that this patch needs reviewing
Comment 9 au 2012-06-17 17:44:51 UTC
Hi Bård, thank you for the patch!

As you may already know, MediaWiki is currently revamping its PHP-based parser
into a "Parsoid" prototype component, to support the rich-text Visual Editor
project:

   https://www.mediawiki.org/wiki/Parsoid
   https://www.mediawiki.org/wiki/Visual_editor

Folks interested in enhancing the parser's capabilities are very much welcome
to join the Parsoid project, and contribute patches as Git branches:

   https://www.mediawiki.org/wiki/Git/Tutorial#How_to_submit_a_patch

Compared to .diff attachments in Bugzilla tickets, Git branches are much easier
for us to review, refine and merge features together.

Each change set has a distinct URL generated by the "git review" tool, which
can be referenced in Bugzilla by pasting its gerrit.wikimedia.org URL as a
comment.

If you run into any issues with the patch process, please feel free to ask on
irc.freenode.net #wikimedia-dev and the wikitext-l mailing list. Thank you!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links