Last modified: 2014-09-02 04:42:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53494, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51494 - Use Beta cluster as a true canary for code deployments (tracking)
Use Beta cluster as a true canary for code deployments (tracking)
Status: NEW
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Nobody - You can work on this!
rmqa-2013
: tracking
Depends on: 48501 51497 52357 57583 62835 50622 50623 60058 62836 63538 63746
Blocks: tracking
  Show dependency treegraph
 
Reported: 2013-07-16 23:48 UTC by Greg Grossmeier
Modified: 2014-09-02 04:42 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2013-07-16 23:48:40 UTC
BetaLabs is awesome. It is catching a lot of breakages that would otherwise hit users. We're grateful of it.

What are the specific limitations with BetaLabs that is preventing us from whole-heartedly trusting a breakage on BetaLabs as a blocker for wider deployment? Either mark those are blockers of this bug or report them and mark them as blockers :-)
Comment 1 Tim Starling 2013-10-15 03:05:15 UTC
I understand it generally runs master rather than the current deployment branch, so it's not really useful for testing changes which happen outside of the normal CI cycle.

It uses Varnish for text instead of Squid, exposing known bugs that do not occur in production.

It apparently uses a different set of extensions to production.

It uses a different deployment system to production, which makes it difficult to reproduce bugs related to non-atomic code tree update.
Comment 2 Greg Grossmeier 2013-10-15 16:13:47 UTC
(In reply to comment #1)
> I understand it generally runs master rather than the current deployment
> branch, so it's not really useful for testing changes which happen outside of
> the normal CI cycle.

Right (if I'm understanding you correctly) this (beta cluster) won't catch things that aren't first merged to master for some amount of time before being on the production cluster.

> It uses Varnish for text instead of Squid, exposing known bugs that do not
> occur in production.

Unfortunate, but hopefully the switch of production text to varnish will happen soon enough. Do you think it is worth it to switch Beta Cluster (back?) to Squid for the time being? I guess that depends on how long the Varnish text transition will end up taking...

> It apparently uses a different set of extensions to production.

Some of this is by design (eg: Flow), but I'm curious now which extensions differ and why...

> It uses a different deployment system to production, which makes it difficult
> to reproduce bugs related to non-atomic code tree update.

Right, maybe the wording of "true canary for code deployments" wasn't the best. Maybe, "true canary for production"? Deploying will be different on Beta Cluster until/when/if production moves to a Continuous Deployment system, no way to get around that. Luckily, experience with Beta Cluster should help inform that transition.
Comment 3 Antoine "hashar" Musso (WMF) 2013-10-15 17:52:20 UTC
(In reply to comment #1)
> I understand it generally runs master rather than the current deployment
> branch, so it's not really useful for testing changes which happen outside of
> the normal CI cycle.

When we created beta the aim was to catch bugs before they land in wmf branches.  We used test.wikipedia.org to test out wmf branch before syncing.   Maybe we could set up some more wiki that would use the wmf branches as well.

> It uses Varnish for text instead of Squid, exposing known bugs that do not
> occur in production.

That followed a discussion I had with Mark over IRC. Since text varnish was (and is) going to land in production it seemed like a good idea to play test on beta.  We did discover a few bugs and I think it helped move varnish text forward.

I would prefer we do not revert back to squid, its configuration is not handled via puppet and I dont think it is worth the effort.


> It apparently uses a different set of extensions to production.

There might be some differences. IIRC CheckUser has been explicitly disabled. But if an extension is missing we should add it in and configure it for beta.

> It uses a different deployment system to production, which makes it difficult
> to reproduce bugs related to non-atomic code tree update.

We use a shared NFS export (/data/project) which is where deployment-bastion (aka tin) and the apaches/jobrunner are reading files from.  So we just git pull and have instant deploy,  just like we used to do a while ago with Zwinger.

Back in January 2013, we had git-deploy on beta to stage it before deploying in production. The project is apparently stalled and had some issues with labs so we reverted back to the NFS share.   With Sartoris apparently getting some attention, the people working on it could well migrate beta to Sartoris.

Additionally, the reason we are not using scap is that it depends on debian package and a myriad of puppet changes.  I don't have merge right on operations/puppet.git and eventually got fed up trying to get change merged in, so I just abandoned the idea of using scap.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links