Service Deployments Should be Ramps, not Cliffs
[Going to try a mini-post…keeping it short and focused]
In my previous post I introduced you to Online Experimentation. The general use case for experimentation is to take two or more websites or services, each with different features or designs, and test them against each other using real users. It is assumed that all the variants (candidates) under test are production ready (i.e. been through your quality process including pre-production testing). But suppose your control variant is your existing service as it is running today, and the treatment (new) variant is the new version you want to deploy? It’s not some new feature or design you wish to test, but simply the new and potentially dangerous code of your next version deployment? What you have then instead of an A/B experiment is a deployment strategy!
Exposure Control
The idea of limiting the exposure of your new and potentially dangerous code to a limited number of users is something Ken Johnston and I have been calling Exposure Control. With an advanced Experimentation system like Microsoft’s Experimentation Platform you launch your new version with only a few users exposed to your potentially dangerous code (as little as a fraction of a percent of your users), and as you gain confidence you can increase this amount. Later, rinse, repeat and you have a ramp-up strategy to launching the next version of new services. What constitutes confidence? Well you should be monitoring your performance monitors (CPU, memory, etc.), the service metrics itself (does it work?), and with a system like ExP you can even see if there is a difference in usage between the old and new system. If the new system has less click-throughs in a certain area (and no user exposed features have been changed, aka an A/A test), then this can indicate a problem with the new system. For example that the new system is dropping page requests or even just slow (slow page loads will decrease user engagement).
Testing in Production (TiP)
We won’t necessarily abrogate all pre-production testing, but as long as you have exposure control to mitigate the potential negative consequences of any problems you encounter, why not do most of your Testing in Production (TiP)? The ability to start with a small exposed user base is crucial. Even with a small user base any big problems will be obvious. Then as you ramp up the user base any subtler problems will reveal themselves. Of course with the easy ability to ramp-up should also be the easy ability to ramp back down (roll-back), so if you are adversely affecting customers, you can remove the offending service from production.
Is this just for Websites?
No, this is for any service deployed to the internet that is ultimately exposed through a website. Go ahead and deploy that backend database change: the control website will hit the stack that uses the existing database, while the treatment website hits the stack that uses the new database system (but otherwise is exactly the same). Then it’s just a matter of ramping up the number of users who see the treatment website.
But wait, There’s more
It’s not just the number of users who can see the new service that can be controlled. Exposure control can be conditional on other user parameters such as what browser the user has, or what region they live in. More on this in a future post.