Last modified: 2013-12-03 20:50:56 UTC
I just found an old redirect at https://zh.wikipedia.org/w/index.php?title=2007%E5%8E%A6%E9%97%A8PX%E9%A1%B9%E7%9B%AE%E7%BC%93%E5%BB%BA&action=history which worked in the day of creation, but doesn't work anymore. Redirect parsing code at that time (in commit 5cdc003c00c5b7dbc8395bc10ea067b1bbc19a44, Thu Jun 14 17:36:12 2007 +0000): /** * Create a new Title for a redirect * @param string $text the redirect title text * @return Title the new object, or NULL if the text is not a * valid redirect */ public static function newFromRedirect( $text ) { $mwRedir = MagicWord::get( 'redirect' ); $rt = NULL; if ( $mwRedir->matchStart( $text ) ) { $m = array(); if ( preg_match( '/\[{2}(.*?)(?:\||\]{2})/', $text, $m ) ) { # categories are escaped using : for example one can enter: # #REDIRECT [[:Category:Music]]. Need to remove it. if ( substr($m[1],0,1) == ':') { # We don't want to keep the ':' $m[1] = substr( $m[1], 1 ); } $rt = Title::newFromText( $m[1] ); # Disallow redirects to Special:Userlogout if ( !is_null($rt) && $rt->isSpecial( 'Userlogout' ) ) { $rt = NULL; } } } return $rt; } I'm not sure when it was made more strict, but obviously it broke some old content and people didn't try to clean them up.
The current code for this is in WikitextContent, and the regex does look stricter. The regex itself was changed back in 2008, though, in r38737 by Roan (with a reference to bug 15053), and made a bit more relaxed (accepting the colon) in 2008 too in r38974 by Brion. ---- The current code for reference: public function getRedirectTarget() { global $wgMaxRedirects; if ( $wgMaxRedirects < 1 ) { // redirects are disabled, so quit early return null; } $redir = MagicWord::get( 'redirect' ); $text = trim( $this->getNativeData() ); if ( $redir->matchStartAndRemove( $text ) ) { // Extract the first link and see if it's usable // Ensure that it really does come directly after #REDIRECT // Some older redirects included a colon, so don't freak about that! $m = array(); if ( preg_match( '!^\s*:?\s*\[{2}(.*?)(?:\|.*?)?\]{2}!', $text, $m ) ) { // Strip preceding colon used to "escape" categories, etc. // and URL-decode links if ( strpos( $m[1], '%' ) !== false ) { // Match behavior of inline link parsing here; $m[1] = rawurldecode( ltrim( $m[1], ':' ) ); } $title = Title::newFromText( $m[1] ); // If the title is a redirect to bad special pages or is invalid, return null if ( !$title instanceof Title || !$title->isValidRedirectTarget() ) { return null; } return $title; } } return null; }
(In reply to comment #1) > The regex itself was changed back in 2008, though, in r38737 by Roan (with a > reference to bug 15053), and made a bit more relaxed (accepting the colon) in > 2008 too in r38974 by Brion. I'm also pretty sure this makes the bug a WONTFIX, unless you can come up with a better solution :) (CC-ing Roan and Brion)
Per note above, this behavior has been consistent for 5 years so there's not really a great need to handle that misspelled case as back-compat. Resolving as wontfix; please feel free to fix up any similarly affected pages.