Last modified: 2011-01-09 13:01:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 17961 - add comment to generateSitemap.php saying what is encoded
add comment to generateSitemap.php saying what is encoded
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.15.x
All All
: Normal trivial (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: documentation
  Show dependency treegraph
 
Reported: 2009-03-13 02:56 UTC by Dan Jacobson
Modified: 2011-01-09 13:01 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dan Jacobson 2009-03-13 02:56:29 UTC
In generateSitemap.php kindly add a comment explaining what

  $title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" );

is all about.
Say what is encoded in the string. Or what encoding it is.
Comment 1 Ilmari Karonen 2011-01-06 22:45:10 UTC
It seems to be constructing a title consisting of 255 bytes, each of which needs to be URL-encoded: that should be about as long as a URL for a page in the given namespace can get.

The title is valid UTF-8, and consists of 63 repeats of the 4-byte character 
Comment 2 Ilmari Karonen 2011-01-06 22:51:18 UTC
WTF... Bugzilla truncated my comment when I tried to include the 4-byte Unicode character in it.  Trying again with the actual characters removed:

The title is valid UTF-8, and consists of 63 repeats of the 4-byte character U+28B81 followed by the 3-byte character U+5583.  My browser can't display the first character, but Googling for it led me to [[Prince of Tang (Shaowu)]] where it is used in the subject's personal name and discussed in a footnote.  The second character apparently means "keep talking, chattering; mumble" according to http://en.wiktionary.org/wiki/%E5%96%83

Ps. Explanatory comment added in r79769.
Comment 3 Dan Jacobson 2011-01-09 13:01:04 UTC
OK, so they are just using some fun arbitrary characters maybe.
Perhaps add a note somewhere saying how such strings will be used. It's not too clear from the code.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links