Last modified: 2011-03-13 18:04:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 959 - Check email addresses before storing in database for compliance with RFC 2822
Check email addresses before storing in database for compliance with RFC 2822
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
User login and signup (Other open bugs)
unspecified
All All
: Lowest enhancement with 3 votes (vote)
: ---
Assigned To: T. Gries
http://www.faqs.org/rfcs/rfc2822.html
: patch
Depends on:
Blocks: 1932 454 866 2439
  Show dependency treegraph
 
Reported: 2004-11-29 23:12 UTC by T. Gries
Modified: 2011-03-13 18:04 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
A perl program that returns 0 if it's fed a valid address, and 1 if it's invalid (2.48 KB, text/plain)
2005-04-23 21:14 UTC, Ævar Arnfjörð Bjarmason
Details
Regex checks a string containing ONLY!! an email adress (7.62 KB, patch)
2005-04-26 14:03 UTC, David Majnemer
Details
The check if there is a vaild email somewhere inside of the test string (7.62 KB, patch)
2005-04-26 14:04 UTC, David Majnemer
Details
Implementation of the e-mail address verification without regexps (5.38 KB, patch)
2007-02-21 21:03 UTC, Stefan Majewsky
Details

Description T. Gries 2004-11-29 23:12:06 UTC
Email addresses are currently stored unchecked for mailing passwords or
EmailUser. Any *potential* security hole in php mail() or PEAR:Mail function as
called in UserMailer.php could be exploited by entering pseudo addresses which
do something unwanted.

Remark: I do not mean the email authentication as described in
http://bugzilla.wikipedia.org/show_bug.cgi?id=866 , which only authenticates
addresses for the "higher" functions such as EmailUser and Enotif. But please be
reminded, that the "I forgot my password" mails are sent to un-authenticated
addresses anyway.

Proposed is
(a) converting to all lower case
(b) a check for valid mail address strings


The Enotif and Email Authentification patches 454 and 866 are coming with
solutions to this bug, as they only allow users to store one email address which
matches the regular expression
^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$

I gave provisionally highest priority to this bug.

A fix is easy be checking the user entered (dirty=unauthenticated) email address
simply before storing in SpecialUserlogin.php against the above regular
expression. In case of not matching address, an empty (blank) string address can
be provisonally stored in table user.

I repeat: my Enotif and my Email authentication procedure will be published with
a solution to this problem.
Comment 1 T. Gries 2004-11-30 02:08:05 UTC
To fix this, I introduced a new function in User.php 
 
	/** 
	 * does the string match roughly an email address ? 
	 * @param string $addr email address 
	 * @static 
	 */ 
	function isValidEmailAddr ( $addr ) { 
		return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr)); 
	} 
 
It is currently used in  
 
1) SpecialUserlogin.php (when a user enters a new address into the login form),  
2) SpecialPreferences.php (when a user enters a new address into the preferences form), 
 
"new address" is stored, if 
lowercase(new address string) != lowercase(old stored address string)  
 && (new address string != '') 
 && IsValidEmailAddr 
 
 
Remark: UserMailer.php in Enotif patch performs its own checks against a simplified RFC-822 format. 
Fixed within Enotif 2.00 
 
Comment 2 Brion Vibber 2004-11-30 08:15:47 UTC
Can you please reference the appropriate RFC to ensure that this 
validation regexp is correct and will not reject valid addresses?
Comment 3 T. Gries 2004-11-30 08:32:11 UTC
I am using the regular expression as used in AutoMailto() fix of UseMod Wiki,
see http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/AutoMailto .


     ^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$

return "true" for a single valid email address in the formats I know. It
checking against a simplified RFC-822 format, but will need some further fine
tuning, which I cannot effort at the moment; perhaps, someone of the community
can assist here, please:

EVERYONE is invited to check this regular expression. 

Its purpose within MediaWiki CVS 1.4 is:
The RE shall validate one single valid email address string (i.e. without a name
string) and return "true" in such a case. 

My regular expression is used often and can be regarded as starting point for
optimisation. If anyone has doubts: please let me know and inidcated a
reference, where you base you objection upon.
Comment 4 Zigger 2004-11-30 11:02:53 UTC
A number of things seem missing straight away.  Most people do not need these,
but please refer to RFC2822 section 3, starting with the definition of "address"
or "addr-spec" (?) and working backwards.  Hopefully you can find some compliant
regex elsewhere.
Comment 5 T. Gries 2004-11-30 13:14:43 UTC
reminder for Tom (myself): don't forget to check wfStrencode() respectively strict use of the new database-wrapper functions, when 
storing e-Mail address.

reminder for Brion and other developers having code access:
current storing of email addresses in SpecialUserlogin.php and SpecialPreferences.php in MediaWiki (pre-CVS release) needs urgently 
(as far as I see) escaping against potential database injections, as any user can store any string at the moment.

Version lowered to 1.3.7 to ring the bells.
Comment 6 Brion Vibber 2004-12-01 00:53:58 UTC
Tom's claim about SQL injection is entirely unfounded, as a cursory look at the 
code or simple experimentation will show.
Comment 7 JeLuF 2004-12-01 05:47:48 UTC
The regexp rejects several valid email addresses, e.g. plus signs and exclamation
marks are valid characters for the mailbox name. 

Latest Punicode extensions to DNS allow the use of a limited set of diacritic
characters for the hostname.
Comment 8 T. Gries 2004-12-19 08:33:58 UTC
-- introducing a new function isValidEmailAddr() in User.php, which performs a
check of the entered email address
Comment 9 T. Gries 2004-12-19 08:37:16 UTC
(In reply to comment #7)
> The regexp rejects several valid email addresses, e.g. plus signs and exclamation
> marks are valid characters for the mailbox name. 
> 
> Latest Punicode extensions to DNS allow the use of a limited set of diacritic
> characters for the hostname.

I reopened the bug, because according to JeLuf some email addresses might result
in false-rejects. But the new function is a starting point.
Comment 10 Gdr 2004-12-29 09:42:05 UTC
Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html  The local-part of an e-mail address may contain any 
ASCII character. (It might be reasonable for you to only allow the local-part to be in dot-atom format, which rules out 
special characters like "@<>[]:;,\ but allows letters, numbers, and !#$%&'*+-/=?^_`{|}~.  See section 3.2.4 of 
RFC2822.
Comment 11 T. Gries 2004-12-29 23:24:50 UTC
(In reply to comment #10)
> Please read RFC2822 http://www.faqs.org/rfcs/rfc2822.html  The local-part of
an e-mail address may contain any 
> ASCII character. (It might be reasonable for you to only allow the local-part
to be in dot-atom format, which rules out 
> special characters like "@<>[]:;,\ but allows letters, numbers, and
!#$%&'*+-/=?^_`{|}~.  See section 3.2.4 of 
> RFC2822.

Coming soon. thanks for letting me know.
Comment 12 Jamesday 2004-12-30 00:15:49 UTC
The proposed check seems to reject making an email address empty. In UK and much
European law it's required that there be a way to remove personal information on
request and making admins or developers do that instead of letting the software
do it is undesirable. Email addresses are considered personal information.

Please do not convert the saved email addresses to lower case. Changing cases
introduces the possibility of bugs when extensions to the use of ASCII
characters are involved - the case conversion would need to know the local
character set encoding method used by the receiving system, information which is
not available. Note also that a local address is allowed to have many mailboxes
which differ only in case (perhaps because of character set translation), so
case conversion could result in the wrong local mailbox on the receiving system
being used. Conversion also makes life needlessly annoying for humans, who may
be using mixed case deliberately to split words.
Comment 13 T. Gries 2004-12-30 00:31:44 UTC
(In reply to comment #12)
> The proposed check seems to reject making an email address empty. In UK and much
> European law it's required that there be a way to remove personal information on
> request and making admins or developers do that instead of letting the software
> do it is undesirable. Email addresses are considered personal information.

JamesDay,

perhaps you misunderstood my check routine.

The logic in SpecialPreferences and UserlOgin.php *does* allow the user to empty
his/her address: of course, it is and will be possible to blank out (to empty)
the own email address

> 
> Please do not convert the saved email addresses to lower case. Changing cases
> introduces the possibility of bugs when extensions to the use of ASCII
> characters are involved - the case conversion would need to know the local
> character set encoding method used by the receiving system, information which is
> not available. Note also that a local address is allowed to have many mailboxes
> which differ only in case (perhaps because of character set translation), so
> case conversion could result in the wrong local mailbox on the receiving system
> being used. Conversion also makes life needlessly annoying for humans, who may
> be using mixed case deliberately to split words.

I agree. I introduced and proposed the conversion to lowercase to facilitate the
"unique" (same) userid AND (same) emailaddress checking with Kate's Namechecher
tool targetting to a single-user-login.

As soon as I am allowed to commit to the CVS, I will commit such changes as
proposed by all of you to the isValidEmailAddress code (etc.).

Tom

Comment 14 Ævar Arnfjörð Bjarmason 2005-04-01 17:50:42 UTC
Paul Warren made what he claims to be a fullly RFC 822 compatable PCRE that
matches valid email addresses, see:
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

The regular expression follows (not that I haven't gotten it to work.)

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)
Comment 15 Wikipedia:en:User:Paddu 2005-04-02 19:19:04 UTC
The following Perl code may or may not be the same as the regexp in the previous
comment and may or may not be more readable:

$newline = '(?:(?:\r\n)?[ \t])';
$newlines = "$newline+";
$opt_newlines = "$newline*";
$normal_chars = '[^()<>@,;:\\".\[\] \000-\031]+';
$dot = "\.$opt_newlines";
$backslashdot = '\\.';
$domainliteral = '([^\[\]\r\\]|'.$backslashdot.')*';
$endofstring = '\Z';
$lookahead_special_char = '(?=[\["()<>@,;:\\".\[\]])';
$stop_normal_chars = "(?:$newlines|$endofstring|$lookahead_special_char)";
$quoted_string_char = '[^\"\r\\]';
$quoted_string = '"(?:'."$quoted_string_char|$backslashdot|$newline".')*"';
$username_component =
"(?:$normal_chars$stop_normal_chars|$quoted_string$opt_newlines)";
$domain_component =
"(?:$normal_chars$stop_normal_chars|\[$domainliteral\]$opt_newlines)";
$at = "@$opt_newlines";
$at_sameline = '@';
$username = "$username_component(?:$dot$username_component)*";
$domainname = "$domain_component(?:$dot$domain_component)*";
$l_ang = "\<$opt_newlines";
$r_ang = "\>$opt_newlines";
$colon = ":$opt_newlines";
$comma = ',\s*';
$comma_no_whitespace = ',';
$route = "(?:$at_sameline$domainname(?:$comma_no_whitespace$at$domainname)*$colon)";
$opt_route = "$route?";
$plain_email_addr = "$username$at$domainname";
$route_addr = "$l_ang$opt_route$plain_email_addr$r_ang";
$email_addr_opt_with_name = "(?:$plain_email_addr|$username_component$route_addr)";
$email_addr_list =
"(?:$email_addr_opt_with_name(?:$comma$email_addr_opt_with_name)*)";
$opt_email_addr_list = "$email_addr_list?";
$semicolon = ';\s*';
$above_regexp =
"(?:$email_addr_opt_with_name|$username_component$colon$opt_email_addr_list$semicolon)";
Comment 16 Wikipedia:en:User:Paddu 2005-04-02 19:23:54 UTC
(In reply to comment #15)
> The following Perl code may or may not be the same as the regexp in the previous
> comment and may or may not be more readable:
> 

The requirements of $at_sameline and $comma_no_whitespace and the inconsistent
usage of $opt_newlines and \s* (which allows \r without a \n or vice versa)
might be a bug in the regexp or in the RFC or in my brain or in my editor.

Comment 17 Ævar Arnfjörð Bjarmason 2005-04-09 04:39:37 UTC
Lowering the severity of this, as it is a theoretical bug, there's no evidence
that this poses a risk at present time.
Comment 18 T. Gries 2005-04-22 06:30:31 UTC
small title amendment only
Comment 19 Ævar Arnfjörð Bjarmason 2005-04-23 20:07:19 UTC
I was under the impression that we didn't block any email addresses currently,
but the regex we use blocks valid email addresses, upping the priority to BLOCKER.
Comment 20 T. Gries 2005-04-23 20:16:28 UTC
(In reply to comment #19)
> I was under the impression that we didn't block any email addresses currently,
> but the regex we use blocks valid email addresses, upping the priority to BLOCKER.

Avar:
please can you supply us with more data:

(a) Which MediaWiki version do you refer to ?
(b) What user e-mail address exactly is false-rejected ?

Thanks in advance
Comment 21 Ævar Arnfjörð Bjarmason 2005-04-23 20:39:07 UTC
In reply to (comment #20)

(1) HEAD (this bug is marked as 1.5-cvs so all comments should be considered to
apply to that branch)
(2) An email address with the plus sign and countless others, your regexp only
matches a very small subset of valid email addresses according to RFC 2822, it
will even reject an IPv4 address after the @.

Unless we can come up with some code that doesn't reject valid email addresses I
think we should remove this check alltogather, there's no proof whatsoever that
invalid email addresses pose any kind of threat and thiscan only serve to induce
annoyance by rejecting valid addresses.
Comment 22 T. Gries 2005-04-23 20:54:12 UTC
(In reply to comment #21) 
> In reply to (comment #20) 
>  
> (2) An email address with the plus sign and countless others, your regexp only 
> matches a very small subset of valid email addresses according to RFC 2822, it 
> will even reject an IPv4 address after the @. 
>  
> Unless we can come up with some code that doesn't reject valid email addresses I 
> think we should remove this check alltogather, there's no proof whatsoever that 
> invalid email addresses pose any kind of threat and thiscan only serve to induce 
> annoyance by rejecting valid addresses. 
 
In User.php, please modify the function 
 
	/** 
	 * does the string match roughly an email address ? 
	 * @param string $addr email address 
	 * @static 
	 */ 
	function isValidEmailAddr ( $addr ) { 
		return preg_match( '/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/', strtolower($addr)); 
	} 
 
 
===> so that it returns ***true***. 
Later, someone can improve the preg_match 
 
 
 
Comment 23 Ævar Arnfjörð Bjarmason 2005-04-23 21:14:12 UTC
Created attachment 453 [details]
A perl program that returns 0 if it's fed a valid address, and 1 if it's invalid

This is a perl program to test if an email address is valid, I modified it from
the Mail::RFC822::Address code, use it like:

$ perl checkaddress.pl "email.address@to.check.com" && echo valid || echo
invalid
Comment 24 T. Gries 2005-04-24 08:06:52 UTC
(In reply to comment #23)
> This is a perl program to test if an email address is valid, I modified it from
> the Mail::RFC822::Address code, use it like:
Good morning Avar,

can you recode this PERL to PHP please - this would be of big help (as I can
concentrate on other issues).
Mediawiki is currently coded in PHP and if you want to propose the improved
check routine, you can better post the PHP version here.

It should be compatible to and can than replace the current shorter function:

	/** 
	 * does the string match roughly an email address ? 
	 * @param string $addr email address 
	 * @static 
	 */ 
	function isValidEmailAddr ( $addr ) { 
		return preg_match(
'/^([a-z0-9_.-]+([a-z0-9_.-]+)*\@[a-z0-9_-]+([a-z0-9_.-]+)*([a-z.]{2,})+)$/',
strtolower($addr)); 
	} 
 
I guess, that Brion or JeLuf will commit any improved convincing version, which
helps to improve MediaWiki REL1_5.
Thanks your contribution.

Looking forward to your replacement function in PHP.
Comment 25 David Majnemer 2005-04-26 13:52:27 UTC
	/** 
	 * does the string match roughly an email address ? 
	 * @param string $addr email address 
	 * @static 
	 */ 
	function isValidEmailAddr ( $addr ) { 
if (preg_match('/(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)|(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*:(?:(?:\\r\\n)?[
\\t])*(?:(?:(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)(?:,\\s*(?:(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*|(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[
\\t])*(?:@(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|"(?:[^\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[
\\t]))*"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[
\\t])*(?:[^()<>@,;:\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\".\\[\\]
\\000-\\031]+(?:(?:(?:\\r\\n)?[
\\t])+|\\Z|(?=[\\["()<>@,;:\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[
\\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*))*)?;\\s*)/', $text)) {
	return true;
} else {
	return false;
}
	}
however! if ONLY AN EMAIL ADRESS IS SENT IN, YOU WANT TO APPEND A \\A RIGHT
AFTER THE '/ THAT STARTS THE REGEX
Comment 26 David Majnemer 2005-04-26 14:03:22 UTC
Created attachment 458 [details]
Regex checks a string containing ONLY!! an email adress

This will check if the string contains only an email adress, no more, no less.
Comment 27 David Majnemer 2005-04-26 14:04:44 UTC
Created attachment 459 [details]
The check if there is a vaild email somewhere inside of the test string

This will return true if there is one valid email in the whole string.
Comment 28 David Majnemer 2005-04-26 14:06:10 UTC
These patches were commited to bugzilla because my total match rant was
incorrect. The patches, however, are defined exactly by their description. One
vaildates true if there is a single valid email in a string. The other if the
string contains exactly one email adress.
Comment 29 Ævar Arnfjörð Bjarmason 2005-05-14 15:11:20 UTC
(In reply to comment #27)
> Created an attachment (id=459) [edit]
> The check if there is a vaild email somewhere inside of the test string

This doesn't pre-strip the email address of comments (see sub strip_comments in
attachment 453 [details]) or check if the address consists only of valid characters (see
the return line in sub valid in attachment 453 [details])
Comment 30 Ævar Arnfjörð Bjarmason 2005-05-21 19:57:08 UTC
Moving the priority down again
Comment 31 Stefan Majewsky 2007-02-21 21:03:37 UTC
Created attachment 3254 [details]
Implementation of the e-mail address verification without regexps

I see that, in MW 1.9.2, the User::isValidEmailAddr function still just checks
whether the string is not empty and there is an "@". I read a bit in certain
resources like Wikipedia and the RFCs and composed the function you can find in
the fourth attachment. It makes (nearly) no use of regexps, instead it reads
and validates the string char by char. I made a speed test on my local Apache
server with PHP (measurement with microtime()). It took 7.228 seconds to
execute my function 100,000 times while the regular expression in the second
attachment needed 14.435 seconds under the same conditions.
Comment 32 Stefan Majewsky 2007-02-21 21:09:24 UTC
Sorry, I made an error. What I expected to be the execution time of the regexp
implementation, was the execution time of both implementations. Some test runs
with corrected time measurement: (my vs. the regexp implementation)

5.758 vs. 6.592
5.763 vs. 6.604
Comment 33 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-22 02:29:33 UTC
Validating e-mail addresses is incredibly error-prone and I don't see any reason
to do it.  It will probably introduce more issues than it fixes.  I suppose
there's a theoretical chance that some vulnerability will be found that relies
on entering an invalid address, but is that chance really significant enough to
bother with?  We should check if there's an @ somewhere to stave off people
entering totally the wrong thing by mistake, and maybe check length if there's a
length limit and check for the presence of invalid characters if there's a
simple list of valid characters, but trying (and probably failing) to implement
the entire RFC doesn't seem useful.
Comment 34 Stefan Majewsky 2007-02-22 08:28:40 UTC
Checking the lengths and the characters is what my function does. Not more and
not less.
Comment 35 Gregor Hagedorn 2009-05-30 13:14:18 UTC
Mediawiki functionality being tested without a solution for 5 years now after the (very) potential security thread was raised, fully agreeing with comment #33, I propose to close the bug: works for me.
Comment 36 Happy-melon 2009-05-30 14:12:08 UTC
The current code does not satisfy the original bug request, hence WONTFIX, not WORKSFORME.
Comment 37 Arthur Richards 2010-12-21 16:33:23 UTC
I'm going to refrain from reopening this for now, but I am surprised that the 'solution' to this really complex problem that no one wants to fix because of:

a) its complexity, 
b) the inability of suggested fixes to accommodate rfc 2822 (there are actually a number of other RFCs that govern the validity of email addresses as well)

is to simply look for an '@' in the email string.  Particularly when an email address technically does not require an '@' in the string...

Some folks out there have spent a lot of time in working this problem over the past years.  After some digging, I unearthed what actually looks like a fabulous email validator.  It validates against the various RFCs out there governing email addresses AND it will provide warnings when an email address is technically considered valid but would more likely be a typo by a user.

The license is open, the source has been community developed and maintained (though principally by Dominic Sayers) and they provide a myriad of email addresses to test against the validator that have been carefully constructed using the RFCs.

Please take a look at: http://www.dominicsayers.com/isemail/
Comment 38 Brion Vibber 2011-01-24 21:13:13 UTC
Note that this has been superseded by bug 22449: for MW 1.17/1.18 the email validation rules specified for HTML 5 client-side input validation are now being used.

This uses a subset of RFC 5322 rules for the mailbox part, and a subset of RFC 1034 for the host part.

Per comments on bug 22449, this appears to be liberal enough to not be a problem for the folks using slightly funky characters, while still being useful for catching obviously broken cases. It also means that we keep consistency between client-side validation and server-side validation, which is always a plus.

(Additional validation rules can be enforced by extensions hooking into the server-side validation function.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links