Last modified: 2011-01-09 13:01:04 UTC
In generateSitemap.php kindly add a comment explaining what $title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" ); is all about. Say what is encoded in the string. Or what encoding it is.
It seems to be constructing a title consisting of 255 bytes, each of which needs to be URL-encoded: that should be about as long as a URL for a page in the given namespace can get. The title is valid UTF-8, and consists of 63 repeats of the 4-byte character
WTF... Bugzilla truncated my comment when I tried to include the 4-byte Unicode character in it. Trying again with the actual characters removed: The title is valid UTF-8, and consists of 63 repeats of the 4-byte character U+28B81 followed by the 3-byte character U+5583. My browser can't display the first character, but Googling for it led me to [[Prince of Tang (Shaowu)]] where it is used in the subject's personal name and discussed in a footnote. The second character apparently means "keep talking, chattering; mumble" according to http://en.wiktionary.org/wiki/%E5%96%83 Ps. Explanatory comment added in r79769.
OK, so they are just using some fun arbitrary characters maybe. Perhaps add a note somewhere saying how such strings will be used. It's not too clear from the code.