Last modified: 2014-11-18 18:07:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T13547, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 11547 - Use only ASCII characters in email confirmation links


Summary:	Use only ASCII characters in email confirmation links

Status:	RESOLVED DUPLICATE of bug 6957

Product:	MediaWiki
Classification:	Unclassified
Component:	Email (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal trivial with 9 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n, utf8

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-10-02 23:24 UTC by Tisza Gergő
Modified:	2014-11-18 18:07 UTC (History)
CC List:	5 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Tisza Gergő 2007-10-02 23:24:04 UTC

Localized versions of Special:Confirmemail might contain non-ascii characters, and not all email clients and/or browsers handle such characters reliably. When this happens, the link will open a new article in the main namespace, and the user will be unable to register (and even might up writing junk articles while trying to do so). Thus either use the unlocalized name of Special:Confirmemail in the link in the email, or use proper urlencoding. (The latter seems less reliable, because a misconfigured client may still decode it, and the browser interpret it as something else than UTF-8.)

Comment 1 Tisza Gergő 2007-11-23 16:51:09 UTC

Got another mail from a user unable to register his email just now. Please fix this; it should be trivial.

Comment 2 Henry Edward Hardy 2008-04-18 19:05:44 UTC

Repost from OLPC rt bug 1632:

I tried webmaster, but that didn't work.  The confirmation message came from 
you so I'll try that.

I gave the wiki signup screen an email address of:
  hgm+olpc@ip-64-139-1-69.sjc.megapath.net
It sent the confirmation to:
  olpc@ip-64-139-1-69.sjc.megapath.net

That name might have been too long, but I expect some parser chopped things 
off after the "+" rather than understanding that it's a valid character in 
email names.

I also tried "-" rather than "+".  Same results.

A warning message might avoid some confusion.  Most people probably won't 
know enough (or have access to) their mail server's log files.

I assume you are familiar with using "+" to make tagged addresses.  If not, 
I'll say more.
 


-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Comment 3 Philippe Verdy 2009-11-14 17:43:32 UTC

URL encoding is definitely NOT the correct way to make the "user@" part of emails address valid. Read the RFCs:
URL encoding just applies to the hierarchical page name within a domain space (and under a hierarchical protocol like "http(s):" and "ftp(s):"),
as well as in query parameters (when they are supported in those protocols).
 
Valid user names in email addresses also use a "safe" alphabet different from that for domain names (which also DO NOT use URL encoding but the encodings supported in IDNA, if they are internationized, and DNS specifications otherwise).

For example, the underscore character "_" (which is part of my own email address and cannot be subtituted into a "+" or "-" and not even into "%7E") or the exclamation punctuation mark "!" is perfectly safe (and standard) in the "user@" part (which in fact is not really described as a user name, but as an identity specifier whose internal syntax may contain a user name and some other authorization data, that cannot be safely stripped out or separated (some sites will use the colon ":" instead of the exclamation mark).

Mapping any Unicode characters with UTF-8 or other representations into a valid "user@" part of an email address is completely unspecified (there's absolutely no reliable algorithm to do this, as the mapping is completely domain-dependant and may even be different from the mapping used for encoding usernames in URI schemes other than "mailto:"). All that can be done is to check that the "user@" part provided uses the valid ASCII subset which is specific to the "mailto:" URI scheme (and distinct from the ASCII subsets used: either in the DNS protocol for domain names; or in the server-local address part of HTTP/FTP URLs).

Note also that "user@" parts in email addresses are normally CASE-SIGNIFICANT (even if most target SMTP servers, will accept emails using any case, and if some RFCs require that users provide an email address containing a user name that can be used as a valid label in a DNS subdomain, in order to activate some functionality) ; STMP relay agents (as well as senders) MUST NOT change the letter case in a pseudo-canonicalization (because they can't realiably know if the recipient server makes the case distinction) : this could simply break the authorization data which is part of the "user@" part (for example it could contain Base64-encoded binary data, in addition to representing the user identity on the target server where it will be delivered to the target POP3/IMAP/WebMail user's mailbox).

Comment 4 Tisza Gergő 2009-11-14 17:54:55 UTC

(In reply to comment #2)
(In reply to comment #3)

You should probably open a new bug to discuss that; this one is about the lack of urlencoding in the confirmation link, which is a wholly unrelated issue.

Comment 5 Karun 2009-12-20 22:36:27 UTC

I do not think we should be using ASCII only. Rather we should use UTF-8 due to Mediawiki needing to support more than just english.

Comment 6 Brion Vibber 2009-12-20 22:44:23 UTC

Please do not abuse the bug tracking system by changing the summary to subvert the entire point of a bug.

Comment 7 Karun 2009-12-20 22:53:50 UTC

This looks like a upstream problem, if browsers and email clients cannot support characters.
What browsers and email clients does this occur with?

Comment 8 Alex Z. 2009-12-21 05:32:21 UTC

AFAICT, the actual issue behind this bug was fixed way back in r35505, and this is actually a dupe of bug 6957

*** This bug has been marked as a duplicate of bug 6957 ***

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links