From the course: Site Reliability Engineering Essential Training
Minimizing user impact during releases
From the course: Site Reliability Engineering Essential Training
Minimizing user impact during releases
Minimizing user impact during releases. In this section, we'll talk about three techniques that you can use as an SRE to minimize user impact during releases. First, canary release. We already touched on this. It is a method in which the new version of an application is released to a small subset of users before releasing it to all users. It's an extremely powerful way of releasing. You pick a canary, release your software, make sure the canary works fine, it checks out and then proceed with your other releases. And progressive rollout can be an option once you finish your canary release. So what are the benefits of canary release? It helps minimize risk, and that's obvious because you are only affecting a very small subset of users. Second, it enables faster feedback loop. You will know right away if there is a problem because you are using only a small subset of users. It makes rollback easy. Just one server or just a couple of servers or a few users involved, so rolling back at that point is relatively easy. So the benefits of canary release is really high, and I still find teams that don't do canary release well. They simply test in their lower environment, assuming that it is a canary release equivalent, and then release it to production. The lower environment releases are not equal to production releases. Your production is different than non-prod environment. So in production, always make sure that you deploy to a canary first before proceeding to release it to all. So let's look at a pictorial representation of canary release flow. Here, the development happens on the left side, it gets automatically tested, and then a release is made. The release is canary release. Canary users use the canary release and then check things out. If there is a problem, if there are issues found, you go back to development. This is the fast feedback loop I talked about earlier. You immediately know what's wrong before affecting the full public of your user base. If there are no issues, go ahead and release it to all users. And when you do the production release, you could follow the progressive rollout method I described in the previous section. And another powerful way of releasing code is to use blue/green deployments. So in blue/green deployments, you maintain two identical environments. For example, green, it could be the current production environment. And then blue, it's an environment that looks like production but it is not production yet, it is ready to serve production. When you are ready to do a release, release it to blue environment and simply redirect traffic to blue environment. So blue environment becomes the production now after the release. Now what is the advantage? Think about rollbacks. You can leave the blue environment as it is and simply point your production traffic back to the green environment, and you should be back in business very quickly. The only problem with the blue/green deployments is it is very, very expensive. And it is obvious because you need to maintain two identical environments, including servers, databases. All the infrastructure associated with it need to be identical. So obviously it is very expensive and in some cases, it could be cost-prohibitive, so you may not be even able to have a blue/green deployment. That said, if your organization is able to afford, blue/green deployments are a great way to ensure reliability. Here is a pictorial representation of blue/green deployments. Very simple. Load balancer in the middle connects to either green or blue environment, depends on where you send the traffic to. Note that the user does not know where the traffic is actually going. Whether it's going to blue or green, he does not know. He doesn't have to. This is how blue/green deployments work. And another powerful technique in release management with SRE is feature flags. So how does it work? It is a technique to enable or disable specific functionality. And here's the keyword, without deploying new code. Feature flags let you do that. You don't have to touch the code, you simply change the configuration thereby changing the behavior of your application. Use conditional logic via external settings or a configuration file without touching the code. Feature flags help canary and progressive rollouts by adding even more power to rollbacks. Enables fast and efficient and safe rollbacks. Why? You are not redeploying anything. You are simply changing the flag, thereby changing the application functionality. There are many commercial and open-source feature flag products out there that you can readily use to enable flagging in your application. In a later section in this lesson, we will look at some of the examples of these products. Now that you have learned techniques for better release management, such as progressive rollouts, canary deployment, blue/green deployment, and feature flags, let's take a look at one of the most overlooked aspect of release engineering. That's monitoring the CI/CD pipeline itself.