On Deployment

Deployment is a process of putting your work into the hands of customers. It is like serving a fully prepared meal to a restaurant visitor. Deployment is integral part in lifecycle of any software. Without deployments, your software does not bring any actual value to the business.

Some deployment examples that you probably have dealt with:

  • Buying DVD and using it to install software;
  • Download installer and install software;
  • Updating operating system (hello, Windows Update);
  • Updating application to newer version.

These kinds of deployments happen and they are necessary. However, I will not talk about that. We will focus on the Software-as-a-Service (SaaS) business model in this post. Great thing about SaaS is that software is actually rented to customers. You have the capability to run (almost) everything in the infrastructure that you manage. Therefore you have the ultimate control.

I want to take you through different models of deployment and build complexity.

Simplest Deployment Ever

We shall start with the simplest deployment.

Complexity comes when there are lots of components and relationships between them exist. If you have only one component, then not much complexity can exist. Cool thing is that complex things are just combinations of that one component.

So mastering one component deployment will put you on the path of complex architectures.

Imagine this situation:

Your application has been running for a loong time. However, a serious security issue has been detected and fixed. We do not want to get hacked, right? Therefore we want to get it out as fast as possible to start defending us from attacks.

How do we do that?

Well, we can simply kill the application. 🙂 No application – no security issues.

Then start a new version in that place as fast as possible.

Hooray! We have successfully deployed a new version of your application.

It looks very simple and easy. And it actually is. However, this simplicity comes with the tradeoff.

After killing V1 and before launching V2, we had no software running at all. It means that we were unreachable. If you have a time when no one needs your software, then this is great! I highly recommend not to overcomplicate things until it is necessary. However, these days 24/7 uptime is expected from customers, so for most businesses it is not acceptable. To solve this issue, we must not have a moment when the application is not running.

How do we achieve that?

Enter Load Balancing

Load balancer is a component that accepts traffic from users and passes it down to the application. You might think that adding it adds too much overhead. Well, load balancing is definitely not free, but it is a really important component. I am pretty sure in the last 24 hours your requests went through thousands of load balancers.

Some of the benefits are:

  • 0 downtime deployments.
  • Traffic control – you can have a 50/50 split between new version and old.
  • They check the health of your application.

Load balancers are everywhere. Literally.

“But hey, if we remove application V1 and add application V2, isn’t it the same?” you might ask. And you would be absolutely correct! It means we have to change something about the deployment process, not only structure.

What we could do is to deploy a new version (V2) of the application, but keep the traffic in V1. Such action would result in two application versions running at the same time.

At this point in deployment, it would be awesome to know:

  1. Has application V2 started?
  2. Is it available?
  3. Is it ready to serve traffic?

If the answer is true to all of them, then good. More details on such questions are available further down the post, so keep going.

Essentially it means that we can safely start redirecting traffic. Load balancer can do just that with some configuration changes.

Hooray, now your customers are using a new application version!

However, we still have two applications running at the same time. We must be sure that the old version has done all the work it had taken. Then, and only then, we can remove it.

Congrats! You have deployed your application with 0 downtime.

When shit hits the fan

Now picture a bad, but common scenario – the new version has a bug that was noticed only in production. This bug does not allow new customers to get into your system. Oh crap! What do we do?

Well, you could repeat the procedure and deploy the old version in the same manner. I have seen this in practice multiple times. However, if it is a critical piece of software that is not working, you want to get it back as soon as possible. Every minute could mean thousands and thousands of dollars lost. Going through deployment procedure at this point has some disadvantages:

  • It is usually not blazing fast;
  • Engineers under stress can pick an incorrect version.

The way to make it better is to perform a rollback. To make it possible, the deployment process needs some changes. Remember the step when we switched the traffic from v1 to v2.

At that point in time, we could do the opposite – switch traffic back from v2 to v1.

That would almost immediately resolve the issue since traffic would be served with a good version! While such approach allows to quickly rollback, it has some drawbacks:

  • It means that two versions of the software are running at the same time;
  • It might cost a lot to keep old versions running;
  • It needs love and care in cleaning up old versions.

Application

I assumed so far that application shutting down is not a big deal. I mean it shuts down and starts and it’s all rainbows and candies.

However, if the application just shut downs without completing the work it was tasked?

What if startup takes 15 seconds and it cannot accept traffic immediately?

What if it has been running but cannot accept traffic anymore?

Graceful Shutdown

Make sure that application supports graceful shutdown. You have to react to applications being shut down and do not accept any more new work. Everything new should flow into newer instances that were deployed earlier in the process. At that point in time, no new work is coming, but there might be something in progress. Application should not be killed yet until it completes the work. If reliability is your primary concern, then you should be able to kill it. And have no impact on work being done. For example, you should survive power outages. However, it means doing work twice and if we can allow it to be completed, then why not? Anyways, once all the work is done, then you do not need the old application anymore and then it could shut down.

When do you know you need to stop accepting new requests? Well, for example when you press CTRL+C (or CMD+C for Mac), it looks like a great opportunity. It will allow you to test locally if your solution works. In the real world, you should investigate what kind of signals your infrastructure sends. Kubernetes sends SIGTERM and waits before killing. Systemd, I believe, does the same by default.

Ready to perform the work

While graceful shutdown is like retiring an instance, then this is about onboarding a new member. For most stateless services, this will not be a problem at all. If service is ready to go from the first moment of being up, then great!

Other kinds of services might need some time to warm up before work:

  • A distributed database might need some time to bootstrap and catch up with peers to serve up-to-date data;
  • Neural network needs time to load billions parameters;
  • Event-sourced system needs time to process history.

In kubernetes, this would be resembled by a readiness probe. Essentially, your application must be able to say “hey, I’m good, give me some work” or “hey, please wait a bit or I am not responsible for the work I do”. And your load balancer must be able to interpret the answer in the correct way.

Conclusion

Deployments are a necessary part of the software business and it will never go away.

We have covered only very simple deployments so far. Real world is much more complex, but mastering the basics is always good. Complexity is built out of this basic understanding, but adding more components.

I hope it was informative for you!