Sunday, December 18, 2016

Automatic scaling for Marathon

When I talk to developers who run their applications on bare metal or virtual servers, they usually tend to confess that many production alerts (incidents) can be solved by restarting the application. Fortunately, for some time we live in cloud era so our team almost exclusively deploys applications to Mesos/Marathon cloud (but that is another story for some other blog post). Thanks to that, when app becomes unhealthy, Marathon automatically restarts it for us so no manual intervention is needed. We solved that problem but another one appeared - in the past months we realized, that most of our production alerts can be solved by manually increasing number of instances (because of varying load, the app was not able to catch up with that). That is a completely different problem that did not have an easy and automatic solution for us at that time and that is why I have created autoscaler.
Výsledek obrázku pro mesos


There is something you have to know about the apps that we run in the cloud nowadays. We went the microservices way and our services almost exclusively communicate via RabbitMQ (asynchronously, by sending messages). There are some autoscaling solutions for apps communicating via HTTP but we did not find any for RabbitMQ.


So what does an autoscaler do? It is a scala application that runs naturally in a Docker container. You run that container in your Marathon cloud and configure it with your RabbitMQ server address. Then you need to configure applications to be automatically scaled. This is most easily done via Marathon labels. For app to be configured for autoscaling, you need to at least specify queue name and maximum number of messages in that queue. When this limit is reached, number of instances is increased.


So a typical example might look like this - you have an application consuming a queue of files to be analyzed. The app runs in 5 instances and is handling it without any problems. Suddenly a demand for analysis increase because new system was connected to this queue. Soon, 5 instances would not be enough but when autoscaler is configured, it will automatically adjust the instances count without triggering and alert that someone would have to deal with manually.


There are some features that make autoscaler smart. One of them is cooldown period. When the limit is reached and application instances count is increased, it will take some time before the instance is started and before it starts helping with the load. That is why when app is scaled, there is a cooldown period that can be configured and during this period, no other scaling appears. Autoscaler just waits and if the number of messages after this cooldown period is still bigger than the treshold it is increased again. So cooldown prevents instances count to jump up and down all the time.

For more information about how to set up autoscaler, please see readme in the GitHub repository.

No comments:

Post a Comment