Last modified: 2014-10-24 17:23:13 UTC
Sub part of Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019 Quoting: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019#c0 >> The BounceHandler extension generates a unique VERP address on every sent email, and has the bouncehandler API that can handle incoming email bounces once the bounce is HTTP POSTed to it via curl from exim. The extension was built as part of the VERP project to properly handle email bounces. Pipe incoming bounces to the bouncehandler API:- You need to add the following to your incoming mails to *@wikimedia.com receving PIPE transport: command = /usr/bin/curl "action=bouncehandler" --data-urlencode "email@-" http://$IP/api.php The genrated VERP address is of the form $wikiId-base36( $UserID )-base36( $Timestamp )-hash( $algorithm,$key, $prefix )@$email_domain where $wikiId - The wiki database name, to support multiple wikis $userID - The user_id from table 'user' - to uniquely identify a recipient. $Timestamp - The unix timestamp. $prefix = $wikiId. '-'. base_convert( $uid, 10, 36). '-'. base_convert( $timeNow, 10, 36); It use the Plancake mail parser external library to extract the headers as an addition to the optional inbuilt regex functions.
Git Repo : https://github.com/wikimedia/mediawiki-extensions-BounceHandler Unit Tests : https://github.com/wikimedia/mediawiki-extensions-BounceHandler/tree/master/tests
We use the plancake mail parser library originally hosted at https://packagist.org/packages/floriansemm/official-library-php-email-parser and in github - https://github.com/plancake/official-library-php-email-parser/blob/master/PlancakeEmailParser.php We use the master branch, since it seems to be the most active one, rather than the v1.0 We have made it in a way, that our local regex functions class will be run when the plancake class is not existing.
Hi Tony, my comments about your php code bellow. Mostly should be simple adjustments. Let me know if you need clarification on anything. For the rest of it: * I looked at PlancakeEmailParser.php in master of that repo, and it looks sane, but it would be best to have composer pull in a specific version, in case someone manages to get something crazy in. * I'm sure someone in Ops will check the exim rule that calls the api, but do make sure they're correctly escaping the emails address and validating certificates when using curl. >>> BounceHandler.php * novalidate-cert - can your example here validate the cert? Add a comment with an insecure example if you want to, but we should encourage safe defaults. processIMAPEmails.php * DRY - a lot of this code looks like duplicates from other classes. You can a fake request to call the api code directly. But when you have two version of security critical code, updates tend to only make it to one. For example: ** A row is still inserted if hash doesn't match up since empty array() returned from getUserDetails includes/ProcessBounceWithRegex.php * Steal the regex from plancake to split lines - (\r?\n|\r) ** that will also let you avoid the trim( $emailLine ) when looking for the end of the header, which seems like it could cause problems. includes/ProcessBounceEmails.php * Relying on no return value from getUserDetails to prevent extra rows being inserted seems fragile (e.g., you return an empty array in processIMAPEmails) * Taking the Date from the email header means it's attacker controlled-- see my comments on BounceHandlerActions::handleFailingRecipient() includes/ProcessBounceWithPlancake.php * DRY: put processEmail in base class * It would be good to gracefully handle exceptions from PlancakeEmailParser * getTo() returns an array, which isn't handled well in the rest of the code includes/BounceHandlerActions.php * unSubscribeUser ** Updating user table based on email is a bad idea. User the user_id which was covered by the signature. (i.e., attacker sets a legit email address, initiates an email, get's the verp address, then changes their email in MediaWiki to match a victim's, replies to the (valid) verp and victim is unconfirmed. ** You should really use user object to handle these instead of direct DB access - The cache isn't cleared properly. * handleFailingRecipient ** The current code doesn't use $bounceTimestamp or $bounceValidPeriod, but $bounceTimestamp isn't authenticated, so can't be trusted if you're planning to query the database for hits in a range. BounceHandlerHooks.php * Only handle $recip[0]? That seems like it will leak a verp hash to non-recipients, so anyone on the list can unconfirm the unlucky first user in the array.
(In reply to Chris Steipp from comment #3) > Hi Tony, my comments about your php code bellow. Mostly should be simple > adjustments. Let me know if you need clarification on anything. > > For the rest of it: > * I looked at PlancakeEmailParser.php in master of that repo, and it looks > sane, but it would be best to have composer pull in a specific version, in > case someone manages to get something crazy in. > * I'm sure someone in Ops will check the exim rule that calls the api, but > do make sure they're correctly escaping the emails address and validating > certificates when using curl. > > >>> > > BounceHandler.php > * novalidate-cert - can your example here validate the cert? Add a comment > with an insecure example if you want to, but we should encourage safe > defaults. Removed IMAP support as it is not going to be used. The feature was put in to the code before we came up with the more systematic method of using the API POST. > > > processIMAPEmails.php > * DRY - a lot of this code looks like duplicates from other classes. You can > a fake request to call the api code directly. But when you have two version > of security critical code, updates tend to only make it to one. For example: > ** A row is still inserted if hash doesn't match up since empty array() > returned from getUserDetails > * file removed ** > > includes/ProcessBounceWithRegex.php > * Steal the regex from plancake to split lines - (\r?\n|\r) > ** that will also let you avoid the trim( $emailLine ) when looking for the > end of the header, which seems like it could cause problems. > **done** > > includes/ProcessBounceEmails.php > * Relying on no return value from getUserDetails to prevent extra rows being > inserted seems fragile (e.g., you return an empty array in processIMAPEmails) > * Taking the Date from the email header means it's attacker controlled-- see > my comments on BounceHandlerActions::handleFailingRecipient() > > ** > includes/ProcessBounceWithPlancake.php > * DRY: put processEmail in base class > * It would be good to gracefully handle exceptions from PlancakeEmailParser > * getTo() returns an array, which isn't handled well in the rest of the code > > > includes/BounceHandlerActions.php > * unSubscribeUser > ** Updating user table based on email is a bad idea. User the user_id which > was covered by the signature. (i.e., attacker sets a legit email address, > initiates an email, get's the verp address, then changes their email in > MediaWiki to match a victim's, replies to the (valid) verp and victim is > unconfirmed. > ** You should really use user object to handle these instead of direct DB > access - The cache isn't cleared properly. > > * handleFailingRecipient > ** The current code doesn't use $bounceTimestamp or $bounceValidPeriod, but > $bounceTimestamp isn't authenticated, so can't be trusted if you're planning > to query the database for hits in a range. > > > BounceHandlerHooks.php > * Only handle $recip[0]? That seems like it will leak a verp hash to > non-recipients, so anyone on the list can unconfirm the unlucky first user > in the array.
Change 152934 had a related patch set uploaded by 01tonythomas: An additional base64_encode( hash_hmac( prefix, true ) ) was added https://gerrit.wikimedia.org/r/152934
Change 153016 had a related patch set uploaded by 01tonythomas: Fixed various issues with the BounceHandler extension https://gerrit.wikimedia.org/r/153016
Change 152934 merged by jenkins-bot: An additional base64_encode( hash_hmac( prefix, true ) ) was added https://gerrit.wikimedia.org/r/152934
Change 153403 had a related patch set uploaded by 01tonythomas: Generate VERP address for a single/array of recipient addresses https://gerrit.wikimedia.org/r/153403
Change 153016 merged by jenkins-bot: Fixed various issues with the BounceHandler extension https://gerrit.wikimedia.org/r/153016
Change 153403 merged by jenkins-bot: Generate VERP address for a single/array of recipient addresses https://gerrit.wikimedia.org/r/153403
Almost every issue pointed out here is fixed in the above patch sets.
Change 153871 had a related patch set uploaded by 01tonythomas: Improved the unsubscribe user method used by BounceHandler extension https://gerrit.wikimedia.org/r/153871
Once https://gerrit.wikimedia.org/r/#/c/153871 is merged, we can close this
Change 153871 merged by 01tonythomas: Improved the unsubscribe user method used by BounceHandler extension https://gerrit.wikimedia.org/r/153871
It looks like this extension will need to be deployed on loginwiki (what Tony just told me): Does that change your perspective, Chris? Just asking because, well, loginwiki is a really really special wiki :)
(In reply to Greg Grossmeier from comment #15) > It looks like this extension will need to be deployed on loginwiki (what > Tony just told me): Does that change your perspective, Chris? > > Just asking because, well, loginwiki is a really really special wiki :) Tony, can you explain why loginwiki? That seems like the least useful wiki to install it on, and we typically only install extensions there that are necessary for it's sso functionality.
(In reply to Chris Steipp from comment #16) > Tony, can you explain why loginwiki? That seems like the least useful wiki > to install it on, and we typically only install extensions there that are > necessary for it's sso functionality. Hey, actually we where discussing about a wiki in prod that would have maximum user accounts, and I think hoo or lego came up with the name loginwiki. The extension greps in the CentralAuth table to identify the failing user. We are comfortable with installing it anywhere in prod, which have CentralAuth access and can interact with the english-wiki ( which would create most bounces ).
If it only has to run on one wiki, then I'd be more comfortable with meta or enwiki. I don't want a larger attack surface on loginwiki than we absolutely need.
(In reply to Chris Steipp from comment #18) > If it only has to run on one wiki, then I'd be more comfortable with meta or > enwiki. I don't want a larger attack surface on loginwiki than we absolutely > need. Talked with Hoo and Jeff on that, and they are positive to get it installed on en-wiki :) Will prepare the patch soon.