Zero downtime deployment without any Jazz

Or with some Jazz. It depends where you stand on Jazz

If you ever come across the term zero downtime deployment you will find it surrounded with some keywords such as Docker Kubernetes and so on which I am probably sure you heard of and most people who talk about it will portrait it as zero-downtime deployment can not be done without all that jazz. In reality that is not all true yes in some cases, some advance tools are justifiable and I would agree much required but you are starting you don't need all that Jazz and the whole fleet to manage that. let's figure out how it can be done with just Nginx.

Let's first understand what is zero downtime deployment means. It is a way of saying we can make changes to the currently running service ( it can be web service or any other service ) which is currently used by people without stopping the service in client perspective. Remember I am saying client's perspective because even in continuous deployment we do need to stop service but we just keep other copies running.

Before we start first we need to know what our service does. In this example, we will have a small web service which takes two values and add them, and returns the sum to the user and it also stores the numbers. I know it doesn't make sense to build such a service but the simplicity of the service will help us to ignore details about the service.

Let's plan a small service with zero downtime deployment. The first step is to start without zero downtime deployment.

architecture_1_bg.png

As you can see that in the architecture diagram we have few clients asking for service and all the requests from the clients are going through Nginx to web service and then web service is making requests to the database to store the data. Honestly, we can remove the Nginx and expose the web service directly to the client it will not affect that much in this case but let's keep it as it is because it will help in the future. Before we start to make our service compatible with zero downtime deployment let me tell you the database table schema because that will help us understand every important point. I am not good with UML diagrams so I will tell the table just by words. The table name is numbers and it has three non-nullable columns which store two numbers and the sum.

Lets start with zero downtime deployment. The first we need is more than one web service at an instance so we can stop one service and deploy new changes and start it after that we will stop other services and deploy changes there.

architecture_lb.png

Now you can see that the above diagram is almost the same as the previous diagram the only thing that is changed is we are running to web services and Nginx is load balancing the client request between the two web services. The other thing is if you look close you will see v1 written below web service that shows that both are running the same version of code.

Now, let's make changes to the code and try to do the zero-downtime deployment. Currently, when a client makes a request with the numbers our service returns out as such the sum of {number1} and {number2} is {sum} and turns out our management doesn't like this output. They did some market research and found out we will a lot better if we start the message with Hello and it will be really good if we can do it without stopping services. The development team made the change in message and run their tests (Just kidding). How we will deploy the changes without stopping the service. In this case that is really easy first, we will do remove one service from Nginx and stop it deploy new code, and reattached it back to Nginx. But there is a small problem now the first server is still running with old code that means some users will see the old message and some will see the new message I think that is a small tradeoff we have to accept. But after we replace the second server code as well we are all good. As you can see that we deployed we code with stoping the whole service. we did it one server at a time.

Let's take a look at a little bit difficult problem which need a little bit of cleverness to overcome. We have one R&D team and they find out that our service can save a lot of storage if remove the sum column from the database because it turns out calculating sum is a lot cheaper than storing it and retrieving it. That sounds really good let's make changes to the database table and to our code and do the zero-downtime deployment. Again we made code changes and tested it. We isolated one server and deployed new code and make changes to the database table because we know the other web server is still taking care of clients but disaster happened the moment when we made changes to the database the first server start to throwing internal server error because the database schema it is using is no longer there. Now the question is how we will do zero-downtime deployment with database changes. The first thing we have to keep in mind that we have to support two versions of the code at the same time for some time until we deploy new code to all of our servers.

Here the clever part at first we will just make changes if code and stop using the sum column in code and leave the database alone. In this case, we don't have to worry about the old server because the database schema is intact. After redeveloping all servers with new code we can remove the database column because nobody is using it. So we will one more deployment but this time we deploy changes only to the database because we only want to remove the column and that's how we change to move around the problem and keep doing zero downtime deployment.

At last, I will sum up this post with this if you can afford to go down and make changes just do that because maintaining zero downtime takes some of the energy to think about the problems just like we discussed above and if it is not serving you much just use that energy to think about other stuff.