Last modified: 2011-03-13 18:04:44 UTC
Email addresses are currently stored unchecked for mailing passwords or EmailUser. Any *potential* security hole in php mail() or PEAR:Mail function as called in UserMailer.php could be exploited by entering pseudo addresses which do something unwanted. Remark: I do not mean the email authentication as described in http://bugzilla.wikipedia.org/show_bug.cgi?id=866 , which only authenticates addresses for the "higher" functions such as EmailUser and Enotif. But please be reminded, that the "I forgot my password" mails are sent to un-authenticated addresses anyway. Proposed is (a) converting to all lower case (b) a check for valid mail address strings The Enotif and Email Authentification patches 454 and 866 are coming with solutions to this bug, as they only allow users to store one email address which matches the regular expression ^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$ I gave provisionally highest priority to this bug. A fix is easy be checking the user entered (dirty=unauthenticated) email address simply before storing in SpecialUserlogin.php against the above regular expression. In case of not matching address, an empty (blank) string address can be provisonally stored in table user. I repeat: my Enotif and my Email authentication procedure will be published with a solution to this problem.
To fix this, I introduced a new function in User.php /** * does the string match roughly an email address ? * @param string $addr email address * @static */ function isValidEmailAddr ( $addr ) { return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr)); } It is currently used in 1) SpecialUserlogin.php (when a user enters a new address into the login form), 2) SpecialPreferences.php (when a user enters a new address into the preferences form), "new address" is stored, if lowercase(new address string) != lowercase(old stored address string) && (new address string != '') && IsValidEmailAddr Remark: UserMailer.php in Enotif patch performs its own checks against a simplified RFC-822 format. Fixed within Enotif 2.00
Can you please reference the appropriate RFC to ensure that this validation regexp is correct and will not reject valid addresses?
I am using the regular expression as used in AutoMailto() fix of UseMod Wiki, see http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/AutoMailto . ^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$ return "true" for a single valid email address in the formats I know. It checking against a simplified RFC-822 format, but will need some further fine tuning, which I cannot effort at the moment; perhaps, someone of the community can assist here, please: EVERYONE is invited to check this regular expression. Its purpose within MediaWiki CVS 1.4 is: The RE shall validate one single valid email address string (i.e. without a name string) and return "true" in such a case. My regular expression is used often and can be regarded as starting point for optimisation. If anyone has doubts: please let me know and inidcated a reference, where you base you objection upon.
A number of things seem missing straight away. Most people do not need these, but please refer to RFC2822 section 3, starting with the definition of "address" or "addr-spec" (?) and working backwards. Hopefully you can find some compliant regex elsewhere.
reminder for Tom (myself): don't forget to check wfStrencode() respectively strict use of the new database-wrapper functions, when storing e-Mail address. reminder for Brion and other developers having code access: current storing of email addresses in SpecialUserlogin.php and SpecialPreferences.php in MediaWiki (pre-CVS release) needs urgently (as far as I see) escaping against potential database injections, as any user can store any string at the moment. Version lowered to 1.3.7 to ring the bells.
Tom's claim about SQL injection is entirely unfounded, as a cursory look at the code or simple experimentation will show.
The regexp rejects several valid email addresses, e.g. plus signs and exclamation marks are valid characters for the mailbox name. Latest Punicode extensions to DNS allow the use of a limited set of diacritic characters for the hostname.
-- introducing a new function isValidEmailAddr() in User.php, which performs a check of the entered email address
(In reply to comment #7) > The regexp rejects several valid email addresses, e.g. plus signs and exclamation > marks are valid characters for the mailbox name. > > Latest Punicode extensions to DNS allow the use of a limited set of diacritic > characters for the hostname. I reopened the bug, because according to JeLuf some email addresses might result in false-rejects. But the new function is a starting point.
Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html The local-part of an e-mail address may contain any ASCII character. (It might be reasonable for you to only allow the local-part to be in dot-atom format, which rules out special characters like "@<>[]:;,\ but allows letters, numbers, and !#$%&'*+-/=?^_`{|}~. See section 3.2.4 of RFC2822.
(In reply to comment #10) > Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html The local-part of an e-mail address may contain any > ASCII character. (It might be reasonable for you to only allow the local-part to be in dot-atom format, which rules out > special characters like "@<>[]:;,\ but allows letters, numbers, and !#$%&'*+-/=?^_`{|}~. See section 3.2.4 of > RFC2822. Coming soon. thanks for letting me know.
The proposed check seems to reject making an email address empty. In UK and much European law it's required that there be a way to remove personal information on request and making admins or developers do that instead of letting the software do it is undesirable. Email addresses are considered personal information. Please do not convert the saved email addresses to lower case. Changing cases introduces the possibility of bugs when extensions to the use of ASCII characters are involved - the case conversion would need to know the local character set encoding method used by the receiving system, information which is not available. Note also that a local address is allowed to have many mailboxes which differ only in case (perhaps because of character set translation), so case conversion could result in the wrong local mailbox on the receiving system being used. Conversion also makes life needlessly annoying for humans, who may be using mixed case deliberately to split words.
(In reply to comment #12) > The proposed check seems to reject making an email address empty. In UK and much > European law it's required that there be a way to remove personal information on > request and making admins or developers do that instead of letting the software > do it is undesirable. Email addresses are considered personal information. JamesDay, perhaps you misunderstood my check routine. The logic in SpecialPreferences and UserlOgin.php *does* allow the user to empty his/her address: of course, it is and will be possible to blank out (to empty) the own email address > > Please do not convert the saved email addresses to lower case. Changing cases > introduces the possibility of bugs when extensions to the use of ASCII > characters are involved - the case conversion would need to know the local > character set encoding method used by the receiving system, information which is > not available. Note also that a local address is allowed to have many mailboxes > which differ only in case (perhaps because of character set translation), so > case conversion could result in the wrong local mailbox on the receiving system > being used. Conversion also makes life needlessly annoying for humans, who may > be using mixed case deliberately to split words. I agree. I introduced and proposed the conversion to lowercase to facilitate the "unique" (same) userid AND (same) emailaddress checking with Kate's Namechecher tool targetting to a single-user-login. As soon as I am allowed to commit to the CVS, I will commit such changes as proposed by all of you to the isValidEmailAddress code (etc.). Tom
Paul Warren made what he claims to be a fullly RFC 822 compatable PCRE that matches valid email addresses, see: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html The regular expression follows (not that I haven't gotten it to work.) (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?: \r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:( ?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0 31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\ ](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?: (?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n) ?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\ r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n) ?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t] )*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])* )(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t] )+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*) *:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+ |\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r \n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?: \r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t ]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031 ]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\]( ?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(? :(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(? :\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(? :(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]| \\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<> @,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|" (?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(? :[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[ \]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|( ?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,; :\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([ ^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\" .\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\ ]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\ r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\] |\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0 00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\ .|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@, ;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(? :[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])* (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[ ^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\] ]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*( ?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:( ?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[ \["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t ])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t ])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(? :\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+| \Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\ ]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n) ?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[" ()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n) ?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<> @,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@, ;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t] )*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\ ".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)? (?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?: \r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[ "()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t]) *))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]) +|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\ .(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z |(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:( ?:\r\n)?[ \t])*))*)?;\s*)
The following Perl code may or may not be the same as the regexp in the previous comment and may or may not be more readable: $newline = '(?:(?:\r\n)?[ \t])'; $newlines = "$newline+"; $opt_newlines = "$newline*"; $normal_chars = '[^()<>@,;:\\".\[\] \000-\031]+'; $dot = "\.$opt_newlines"; $backslashdot = '\\.'; $domainliteral = '([^\[\]\r\\]|'.$backslashdot.')*'; $endofstring = '\Z'; $lookahead_special_char = '(?=[\["()<>@,;:\\".\[\]])'; $stop_normal_chars = "(?:$newlines|$endofstring|$lookahead_special_char)"; $quoted_string_char = '[^\"\r\\]'; $quoted_string = '"(?:'."$quoted_string_char|$backslashdot|$newline".')*"'; $username_component = "(?:$normal_chars$stop_normal_chars|$quoted_string$opt_newlines)"; $domain_component = "(?:$normal_chars$stop_normal_chars|\[$domainliteral\]$opt_newlines)"; $at = "@$opt_newlines"; $at_sameline = '@'; $username = "$username_component(?:$dot$username_component)*"; $domainname = "$domain_component(?:$dot$domain_component)*"; $l_ang = "\<$opt_newlines"; $r_ang = "\>$opt_newlines"; $colon = ":$opt_newlines"; $comma = ',\s*'; $comma_no_whitespace = ','; $route = "(?:$at_sameline$domainname(?:$comma_no_whitespace$at$domainname)*$colon)"; $opt_route = "$route?"; $plain_email_addr = "$username$at$domainname"; $route_addr = "$l_ang$opt_route$plain_email_addr$r_ang"; $email_addr_opt_with_name = "(?:$plain_email_addr|$username_component$route_addr)"; $email_addr_list = "(?:$email_addr_opt_with_name(?:$comma$email_addr_opt_with_name)*)"; $opt_email_addr_list = "$email_addr_list?"; $semicolon = ';\s*'; $above_regexp = "(?:$email_addr_opt_with_name|$username_component$colon$opt_email_addr_list$semicolon)";
(In reply to comment #15) > The following Perl code may or may not be the same as the regexp in the previous > comment and may or may not be more readable: > The requirements of $at_sameline and $comma_no_whitespace and the inconsistent usage of $opt_newlines and \s* (which allows \r without a \n or vice versa) might be a bug in the regexp or in the RFC or in my brain or in my editor.
Lowering the severity of this, as it is a theoretical bug, there's no evidence that this poses a risk at present time.
small title amendment only
I was under the impression that we didn't block any email addresses currently, but the regex we use blocks valid email addresses, upping the priority to BLOCKER.
(In reply to comment #19) > I was under the impression that we didn't block any email addresses currently, > but the regex we use blocks valid email addresses, upping the priority to BLOCKER. Avar: please can you supply us with more data: (a) Which MediaWiki version do you refer to ? (b) What user e-mail address exactly is false-rejected ? Thanks in advance
In reply to (comment #20) (1) HEAD (this bug is marked as 1.5-cvs so all comments should be considered to apply to that branch) (2) An email address with the plus sign and countless others, your regexp only matches a very small subset of valid email addresses according to RFC 2822, it will even reject an IPv4 address after the @. Unless we can come up with some code that doesn't reject valid email addresses I think we should remove this check alltogather, there's no proof whatsoever that invalid email addresses pose any kind of threat and thiscan only serve to induce annoyance by rejecting valid addresses.
(In reply to comment #21) > In reply to (comment #20) > > (2) An email address with the plus sign and countless others, your regexp only > matches a very small subset of valid email addresses according to RFC 2822, it > will even reject an IPv4 address after the @. > > Unless we can come up with some code that doesn't reject valid email addresses I > think we should remove this check alltogather, there's no proof whatsoever that > invalid email addresses pose any kind of threat and thiscan only serve to induce > annoyance by rejecting valid addresses. In User.php, please modify the function /** * does the string match roughly an email address ? * @param string $addr email address * @static */ function isValidEmailAddr ( $addr ) { return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr)); } ===> so that it returns ***true***. Later, someone can improve the preg_match
Created attachment 453 [details] A perl program that returns 0 if it's fed a valid address, and 1 if it's invalid This is a perl program to test if an email address is valid, I modified it from the Mail::RFC822::Address code, use it like: $ perl checkaddress.pl "email.address@to.check.com" && echo valid || echo invalid
(In reply to comment #23) > This is a perl program to test if an email address is valid, I modified it from > the Mail::RFC822::Address code, use it like: Good morning Avar, can you recode this PERL to PHP please - this would be of big help (as I can concentrate on other issues). Mediawiki is currently coded in PHP and if you want to propose the improved check routine, you can better post the PHP version here. It should be compatible to and can than replace the current shorter function: /** * does the string match roughly an email address ? * @param string $addr email address * @static */ function isValidEmailAddr ( $addr ) { return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr)); } I guess, that Brion or JeLuf will commit any improved convincing version, which helps to improve MediaWiki REL1_5. Thanks your contribution. Looking forward to your replacement function in PHP.
/** * does the string match roughly an email address ? * @param string $addr email address * @static */ function isValidEmailAddr ( $addr ) { if (preg_match('/(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)*:(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)(?:,\\s*(?:(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*))*)?;\\s*)/', $text)) { return true; } else { return false; } } however! if ONLY AN EMAIL ADRESS IS SENT IN, YOU WANT TO APPEND A \\A RIGHT AFTER THE '/ THAT STARTS THE REGEX
Created attachment 458 [details] Regex checks a string containing ONLY!! an email adress This will check if the string contains only an email adress, no more, no less.
Created attachment 459 [details] The check if there is a vaild email somewhere inside of the test string This will return true if there is one valid email in the whole string.
These patches were commited to bugzilla because my total match rant was incorrect. The patches, however, are defined exactly by their description. One vaildates true if there is a single valid email in a string. The other if the string contains exactly one email adress.
(In reply to comment #27) > Created an attachment (id=459) [edit] > The check if there is a vaild email somewhere inside of the test string This doesn't pre-strip the email address of comments (see sub strip_comments in attachment 453 [details]) or check if the address consists only of valid characters (see the return line in sub valid in attachment 453 [details])
Moving the priority down again
Created attachment 3254 [details] Implementation of the e-mail address verification without regexps I see that, in MW 1.9.2, the User::isValidEmailAddr function still just checks whether the string is not empty and there is an "@". I read a bit in certain resources like Wikipedia and the RFCs and composed the function you can find in the fourth attachment. It makes (nearly) no use of regexps, instead it reads and validates the string char by char. I made a speed test on my local Apache server with PHP (measurement with microtime()). It took 7.228 seconds to execute my function 100,000 times while the regular expression in the second attachment needed 14.435 seconds under the same conditions.
Sorry, I made an error. What I expected to be the execution time of the regexp implementation, was the execution time of both implementations. Some test runs with corrected time measurement: (my vs. the regexp implementation) 5.758 vs. 6.592 5.763 vs. 6.604
Validating e-mail addresses is incredibly error-prone and I don't see any reason to do it. It will probably introduce more issues than it fixes. I suppose there's a theoretical chance that some vulnerability will be found that relies on entering an invalid address, but is that chance really significant enough to bother with? We should check if there's an @ somewhere to stave off people entering totally the wrong thing by mistake, and maybe check length if there's a length limit and check for the presence of invalid characters if there's a simple list of valid characters, but trying (and probably failing) to implement the entire RFC doesn't seem useful.
Checking the lengths and the characters is what my function does. Not more and not less.
Mediawiki functionality being tested without a solution for 5 years now after the (very) potential security thread was raised, fully agreeing with comment #33, I propose to close the bug: works for me.
The current code does not satisfy the original bug request, hence WONTFIX, not WORKSFORME.
I'm going to refrain from reopening this for now, but I am surprised that the 'solution' to this really complex problem that no one wants to fix because of: a) its complexity, b) the inability of suggested fixes to accommodate rfc 2822 (there are actually a number of other RFCs that govern the validity of email addresses as well) is to simply look for an '@' in the email string. Particularly when an email address technically does not require an '@' in the string... Some folks out there have spent a lot of time in working this problem over the past years. After some digging, I unearthed what actually looks like a fabulous email validator. It validates against the various RFCs out there governing email addresses AND it will provide warnings when an email address is technically considered valid but would more likely be a typo by a user. The license is open, the source has been community developed and maintained (though principally by Dominic Sayers) and they provide a myriad of email addresses to test against the validator that have been carefully constructed using the RFCs. Please take a look at: http://www.dominicsayers.com/isemail/
Note that this has been superseded by bug 22449: for MW 1.17/1.18 the email validation rules specified for HTML 5 client-side input validation are now being used. This uses a subset of RFC 5322 rules for the mailbox part, and a subset of RFC 1034 for the host part. Per comments on bug 22449, this appears to be liberal enough to not be a problem for the folks using slightly funky characters, while still being useful for catching obviously broken cases. It also means that we keep consistency between client-side validation and server-side validation, which is always a plus. (Additional validation rules can be enforced by extensions hooking into the server-side validation function.)