Last modified: 2013-10-15 16:47:29 UTC
Mark has pointed out that the new maintenance/purgeChangedPages.php script could saturate packet buffers in routers and switches if it sends a very large number of UDP packets in a short period of time. The consequence of this would be that some unknown number of packets are silently discarded due to buffer overflow. A suggested solution is to insert a small artificial delay before sending each packet. Brandon suggested that a 10ms should be more than enough to prevent buffer overflows. With that rate limit in place we would effectively throttle the HTCP packet output to 100/s (360,000/hr). It's estimated that a 10K-item purge would take ~17 minutes of wall clock time to complete and a 150K list would take 25 minutes .
Further discussion with Brandon and Mark sets a rate of 200/s (5ms delay) as also acceptable.
Change 89325 had a related patch set uploaded by BryanDavis: Add HTCP rate limiting to SquidUpdate https://gerrit.wikimedia.org/r/89325
Change 89842 had a related patch set uploaded by BryanDavis: Add configurable delay between purgeChangedPages batches https://gerrit.wikimedia.org/r/89842
Change 89844 had a related patch set uploaded by BryanDavis: Add configurable delay between purgeChangedPages batches https://gerrit.wikimedia.org/r/89844
Change 89325 merged by jenkins-bot: Add configurable delay between purgeChangedPages batches https://gerrit.wikimedia.org/r/89325
Change 89842 merged by jenkins-bot: Add configurable delay between purgeChangedPages batches https://gerrit.wikimedia.org/r/89842
Change 89844 merged by jenkins-bot: Add configurable delay between purgeChangedPages batches https://gerrit.wikimedia.org/r/89844
Patch is merged, backported and pushed to cluster.