Last modified: 2014-11-05 18:49:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71100, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69100 - Performance review BounceHandler extension for Deployment
Performance review BounceHandler extension for Deployment
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
BounceHandler (Other open bugs)
master
All All
: Normal normal (vote)
: ---
Assigned To: Aaron Schulz
: performance
Depends on:
Blocks: 69019
  Show dependency treegraph
 
Reported: 2014-08-04 12:31 UTC by Tony Thomas
Modified: 2014-11-05 18:49 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tony Thomas 2014-08-04 12:31:16 UTC
Sub part of Bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019


Quoting: https://bugzilla.wikimedia.org/show_bug.cgi?id=69019#c0
>>
The BounceHandler extension generates a unique VERP address on every sent email, and has the bouncehandler API that can handle incoming email bounces once the bounce is HTTP POSTed to it via curl from exim. The extension was built as part of the VERP project to properly handle email bounces. 

Pipe incoming bounces to the bouncehandler API:- 
You need to add the following to your incoming mails to *@wikimedia.com receving PIPE transport: 
command = /usr/bin/curl "action=bouncehandler" --data-urlencode "email@-" http://$IP/api.php

The genrated VERP address is of the form 
$wikiId-base36( $UserID )-base36( $Timestamp )-hash( $algorithm,$key, $prefix )@$email_domain 

where 
$wikiId - The wiki database name, to support multiple wikis
$userID - The user_id from table 'user' - to uniquely identify a recipient.
$Timestamp - The unix timestamp.
$prefix = $wikiId. '-'. base_convert( $uid, 10, 36). '-'. base_convert( $timeNow, 10, 36);

It use the Plancake mail parser external library to extract the headers as an addition to the optional inbuilt regex functions.

Gerrit: https://github.com/wikimedia/mediawiki-extensions-BounceHandler
Comment 1 Tony Thomas 2014-10-05 20:06:52 UTC
The extension is currently in beta, and in the testing phase. Can we give this a bit speed-up ? You can test a sample mail at deployment.wikimedia.beta.wmflabs.org/wiki/Special:EmailUser.
Comment 2 Greg Grossmeier 2014-10-28 17:28:14 UTC
Ori/Aaron have decided to stop doing pre-deployment performance reviews.

I've unblocked this from the deployment bug and moved it to the BounceHandler component (feel free to close or leave open if you ever do want a perf review).
Comment 3 Aaron Schulz 2014-10-28 22:07:07 UTC
Some bits:

* getOriginalEmail() doesn't call reuseConnection() and the return value is not well defined
* processBounceHeaders()/handleFailingRecipient() doesn't call reuseConnection() 
* The query in handleFailingRecipient() is unindexed, how large is the table expected to get?
* Also, it would be nice of the table could go in extension-1 (e.g. a custom cluster). Echo and Flow do this.

Where is the code for the broker that takes in the bounces and does a POST to the API?
Comment 4 Tony Thomas 2014-10-29 05:39:01 UTC
(In reply to Aaron Schulz from comment #3)

> Where is the code for the broker that takes in the bounces and does a POST
> to the API?

Here is the code that would go to polonium, that would HTTP POST to the API https://github.com/wikimedia/operations-puppet/blob/production/templates/exim/exim4.conf.SMTP_IMAP_MM.erb#L670

We will have it point to en-wiki after this change - https://gerrit.wikimedia.org/r/#/c/168622/
Comment 5 Gerrit Notification Bot 2014-10-29 06:17:47 UTC
Change 169654 had a related patch set uploaded by 01tonythomas:
Handle the return value of getOriginalEmail efficiently

https://gerrit.wikimedia.org/r/169654
Comment 6 Gerrit Notification Bot 2014-10-29 17:40:59 UTC
Change 169654 merged by jenkins-bot:
Various performance fixed for the BounceHandler extension

https://gerrit.wikimedia.org/r/169654
Comment 7 Sam Reed (reedy) 2014-10-30 19:19:24 UTC
(In reply to Aaron Schulz from comment #3)
> * The query in handleFailingRecipient() is unindexed, how large is the table
> expected to get?

I was about to file a bug about that

I guess it should probably be a partial index on the email field, and then the timestamp.

Something like:

CREATE INDEX /*i*/br_mail_timestamp ON /*_*/user (br_user_email(50), br_timestamp);


In core we only index the first 50 chars of the email on the user table
Comment 8 Aaron Schulz 2014-11-03 18:12:07 UTC
I'd second having an index as above.
Comment 9 Sam Reed (reedy) 2014-11-03 18:21:25 UTC
(In reply to Sam Reed (reedy) from comment #7)
> CREATE INDEX /*i*/br_mail_timestamp ON /*_*/user (br_user_email(50),
> br_timestamp);

Using the right table would help :)

CREATE INDEX /*i*/br_mail_timestamp ON /*_*/bounce_records(br_user_email(50), br_timestamp);
Comment 10 Gerrit Notification Bot 2014-11-03 18:55:31 UTC
Change 170759 had a related patch set uploaded by 01tonythomas:
Create table index for 'bounce_records' table

https://gerrit.wikimedia.org/r/170759
Comment 11 Gerrit Notification Bot 2014-11-03 19:19:02 UTC
Change 170759 merged by jenkins-bot:
Create table index for 'bounce_records' table

https://gerrit.wikimedia.org/r/170759
Comment 12 Gerrit Notification Bot 2014-11-03 21:55:25 UTC
Change 170835 had a related patch set uploaded by Aaron Schulz:
Added wgBounceHandlerCluster setting

https://gerrit.wikimedia.org/r/170835
Comment 13 Gerrit Notification Bot 2014-11-05 18:06:57 UTC
Change 170835 merged by jenkins-bot:
Added wgBounceHandlerCluster setting

https://gerrit.wikimedia.org/r/170835
Comment 14 Aaron Schulz 2014-11-05 18:49:24 UTC
LGTM

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links