Sunday, December 18, 2016

Automatic scaling for Marathon

When I talk to developers who run their applications on bare metal or virtual servers, they usually tend to confess that many production alerts (incidents) can be solved by restarting the application. Fortunately, for some time we live in cloud era so our team almost exclusively deploys applications to Mesos/Marathon cloud (but that is another story for some other blog post). Thanks to that, when app becomes unhealthy, Marathon automatically restarts it for us so no manual intervention is needed. We solved that problem but another one appeared - in the past months we realized, that most of our production alerts can be solved by manually increasing number of instances (because of varying load, the app was not able to catch up with that). That is a completely different problem that did not have an easy and automatic solution for us at that time and that is why I have created autoscaler.
Výsledek obrázku pro mesos

There is something you have to know about the apps that we run in the cloud nowadays. We went the microservices way and our services almost exclusively communicate via RabbitMQ (asynchronously, by sending messages). There are some autoscaling solutions for apps communicating via HTTP but we did not find any for RabbitMQ.

So what does an autoscaler do? It is a scala application that runs naturally in a Docker container. You run that container in your Marathon cloud and configure it with your RabbitMQ server address. Then you need to configure applications to be automatically scaled. This is most easily done via Marathon labels. For app to be configured for autoscaling, you need to at least specify queue name and maximum number of messages in that queue. When this limit is reached, number of instances is increased.

So a typical example might look like this - you have an application consuming a queue of files to be analyzed. The app runs in 5 instances and is handling it without any problems. Suddenly a demand for analysis increase because new system was connected to this queue. Soon, 5 instances would not be enough but when autoscaler is configured, it will automatically adjust the instances count without triggering and alert that someone would have to deal with manually.

There are some features that make autoscaler smart. One of them is cooldown period. When the limit is reached and application instances count is increased, it will take some time before the instance is started and before it starts helping with the load. That is why when app is scaled, there is a cooldown period that can be configured and during this period, no other scaling appears. Autoscaler just waits and if the number of messages after this cooldown period is still bigger than the treshold it is increased again. So cooldown prevents instances count to jump up and down all the time.

For more information about how to set up autoscaler, please see readme in the GitHub repository.

Sunday, December 27, 2015

Difference between ARG and ENV in Dockerfile

Starting with Docker 1.9 we have a new feature that I find very useful and necessary - build arguments (see the docker build reference). But docker already has a similar concept - environment variables. So one might be confused (and I certainly was) about the relation of these two.

Actually they have lot in common. Idea for build arguments seems to be created by a several use cases that required passing environment variables to docker at build time. But docker team did not like that idea and decided to create a new concept instead and that is how build arguments were created.

So for me the main distinction between these two concepts are that build arguments are suitable for passing parameters to the dockerfile during a build time. On the other hand, environment variables are something, that you will probably use mainly during runtime (although there are some exceptions but let’s not get into that right now). Another valid use case for environment variables is declaring them inside the Dockerfile. If done like that, all Dockerfiles inheriting from yours can access that value. But there is no way how to change that value during a build time.

Sunday, December 20, 2015

How to build docker image from Gradle

Once you have an application written in Java and built with Gradle the natural step in these days might be to put this app into a Docker container. And there is a possibility to build docker images directly from Gradle (which is a preferred way for me - much better than having an external script with docker commands).

First possibility is to use some ready made solution like a gradle plugin. There are few of them, but I have experience with gradle-docker. What it does for you is that it even creates a Dockerfile - you just have to describe its contents in a DSL provided by the plugin. The result might look like this:

apply plugin: 'com.bmuschko.docker-java-application
docker {
    url = 'unix:///var/run/docker.sock'
    javaApplication {
        baseImage = 'java:8'
        maintainer = 'Alena Varkočková'
        port = 80
        tag = "${}/$applicationName:$version"
    registryCredentials {
        url = ''
In the background these properties are transformed to the actual dockerfile, that is saved into the build/docker folder. But to be honest, I don’t like this approach at all - instead of using a pretty straightforward language of Dockerfiles that you are probably already familiar with, you have to learn new DSL and hope that the result will be what you are expecting it to be. I think that everyone working with Docker should know the Dockerfile format and should write that by himself/herself. So my next step was to use the plugin to build the image but use my own  Dockerfile. And you can do that quite easily.
apply plugin: 'com.bmuschko.docker-remote-api'

import com.bmuschko.gradle.docker.tasks.image.DockerBuildImage
import com.bmuschko.gradle.docker.tasks.image.DockerPushImage
import com.bmuschko.gradle.docker.DockerRegistryCredentials

def DOCKER_GROUP = 'docker'
def dockerRegistryCredentials = new DockerRegistryCredentials()

dockerRegistryCredentials.url = ''
task dockerBuildImage(type: DockerBuildImage) {
    dependsOn distTar
    group = DOCKER_GROUP

    url = 'unix:///var/run/docker.sock'
    inputDir = project.projectDir
    tag = "${dockerRegistryCredentials.url}/${}/$applicationName:$version"
    registryCredentials = dockerRegistryCredentials

task dockerPushImage(type: DockerPushImage) {
    group = DOCKER_GROUP

    dependsOn dockerBuildImage

    imageName = "${dockerRegistryCredentials.url}/${}/$applicationName"
    tag = project.version
This is better. I was kind of excited by this approach for a little while - until I figured out that the plugin supports only a small subset of docker build options. If you need something special, you have to write it by yourself. In the background, the plugin uses another abstraction over docker, docker-java client and it actually has the same problems. Docker API is changing so rapidly and new features are coming almost instantly that these abstractions cannot keep up to this pace and you are constantly dealing with some features missing. That’s when I realized that I don’t need the abstractions at all. My last version (without the plugin) now looks like the next example. It is very simple and straightforward and I like it.

def imageName = "${}/$applicationName:$version"
task dockerBuildImage(type:Exec) {
   dependsOn distTar
   group = 'docker'

   commandLine 'docker', 'build', '-t', imageName, '--build-arg', "version=$version", '.'

task dockerPushImage(type:Exec) {
   dependsOn dockerBuildImage
   group = 'docker'

   commandLine 'docker', 'push', imageName

Sunday, June 7, 2015

Finding the best in-memory LRU cache for Java/Scala

For one of our application we were standing before a decision - for every request we need to do a remote call. But doing so would increase our response time significantly. Fortunately, these calls can be easily cached and it can be fitted into RAM so we were not forced to use some external cache like Memcached.

So we were looking for a cache implementation on top of JVM (preferably Java 8 or Scala) and I was surprised how many good looking possibilities exists.
  • There is a cache as a part of Hystrix but that is only PWR (per web request)
  • Guava contains a Cache that looks very promising and has most of the features we were looking for
  • After some research our winner was Caffeine with a very nice documentation and API. By the way, this cache is also used in Spray

And how did we use it? First of all, we wanted a LRU cache. So cache that will be able to operate with a limited size (of RAM) and throw away the least important item when the size goes beyond a treshold.

Strongly inspired by the Spray implementation, this is how our Caffeine integration (in Scala) looks like:
package com.avast.someproject.crosscutting.cache

import com.github.benmanes.caffeine.cache.Caffeine

import scala.concurrent.{ExecutionContext, Future, Promise}

class CaffeineCache[V](cacheName: String, val maxCapacity: Int, val initialCapacity: Int) extends Cache[V] {
 private val store: com.github.benmanes.caffeine.cache.Cache[Object, Future[V]] = Caffeine.newBuilder()

 override def get(key: Object): Option[Future[V]] = {
   val valueFromCache = store.getIfPresent(key)
   if (valueFromCache == null) None else Some(valueFromCache)

 override def get(key: Object, genValue: Object ⇒ Future[V])(implicit ec: ExecutionContext): Future[V] = {
   val promise = Promise[V]()
   store.get(key, (key:Object) => promise.future) match {
     case null => Future { throw new RuntimeException("Unable to retrieve value from cache") }
     case futureResult if futureResult == promise.future => {
       val future = genValue(key)
       future.onComplete { value ⇒
         // in case of exceptions we remove the cache entry (i.e. try again later)
         if (value.isFailure) {
     case futureResult => {

Sunday, May 17, 2015

Pushing Hystrix statistics to Graphite

If you plan to use Hystrix for your remote calls, you’ll need great monitoring. Actually Hystrix contains awesome dashboard that is free and easy to use for everyone but it make sense to have at least a general overview of your services in one monitoring app for your whole application. Using the same tool for everyhing allows you to easily find correlations between different metrics (e.g. high latency and low number of sales).

We use Graphite for most of our business and performance metrics so we wanted to plot Hystrix statistics on our dashboards too. And for a newcomer, the path might not be so obvious in the first time.

Good thing is that Netflix engineers thought about users wanting to plug in their own monitoring tools and prepared two plugin endpoints that will help you accomplish this. Detailed descriptions of plugins can be found in the documentation.

Event notifier
If basic statistics about the command being run and finished enough for you, event publisher is good enough choice. If you are using statsd for metrics aggregation it is also the easier one. The important thing to note here - time you spent in Event notifier will count as a time of command execution. So whatever you do, it should be extremely fast. Luckily, StatsD supports UDP metrics reporting and firing up an UDP packet is extremely fast.

Event notifier has two endpoints:
  • markCommandExecution - run at the end of command execution, reports all events that occured during the execution
  • markEvent - for every unique event in every command this method is executed
So in the end, it is just matter of how often do you need to do the reporting. I think that for statsd metrics, markCommandExecution is good enough.

In the end, simple event notifier implementation might look similar as the example below.

public class StatsDEventNotifier extends HystrixEventNotifier {

    private StatsDClient stats;
    private final String keyPrefix = "hystrix.";

    public StatsDEventNotifier(StatsDClient stats) {
        this.stats = stats;

    public void markEvent(HystrixEventType eventType, HystrixCommandKey key) {
        String commandPrefix = getCommandKey(key);
        if (eventType == HystrixEventType.SUCCESS) {
            stats.count(commandPrefix + "success");
        if (eventType == HystrixEventType.FAILURE) {
            stats.count(commandPrefix + "failure");
        if (eventType == HystrixEventType.FALLBACK_SUCCESS) {
            stats.count(commandPrefix + "fallback_success");
        if (eventType == HystrixEventType.FALLBACK_FAILURE) {
            stats.count(commandPrefix + "fallback_failure");
        if (eventType == HystrixEventType.EXCEPTION_THROWN) {
            stats.count(commandPrefix + "exception");

    private String getCommandKey(HystrixCommandKey key) {
        return keyPrefix + stats.encode( + ".";

Saturday, January 3, 2015

Preventing build failure for muted tests in TeamCity

If you are using Maven to run your JVM project and tests and you are using TeamCity at the same time, you might run to a similar problem as we did. There is a cool feature in TeamCity, ability to mute tests. This means that you can mark a test as muted and then if this test fails, the whole build status will still be a success.

But not if you are using a Maven + TeamCity combination. There is a bug in TeamCity reported in 2011 and even though in this thread a TeamCity developer marked the issue as major problem and promised to fix it ASAP, it still did not happen.

Fortunately, there is a solution hidden in the comments at the bug report. If you want this to work, you need to take advantage of the Surefire maven plugin configuration setting - testFailureIgnore. What it does is that the build will not fail if a test fails. But that is not something you want for your regular maven build. So you definitely should not update your maven pom.xml with this option because for your non-TeamCity builds your probably still want the build to fail if the test fails. That’s why your should take advantage of the Additional Maven command line parameters of your maven build step and run Maven with additional parameter -Dmaven.test.failure.ignore=true

With this thing set up, you just need to make sure that in your build configuration settings you have the at least one test failed checked. With TeamCity set up as this, Maven will ignore the test failures, but the TeamCity will still parse the SureFire output report and fail the build step if a test (that is not muted) fails.

Tuesday, October 28, 2014

Subscribe to new bugs reported to JIRA

If you have a project and your users are used to submitting JIRA tickets with feature requests and bugs, it might look like a good idea to monitor such bugs and act fast when a critical bug occurs. What I was missing for a long time was a feature that will notify me about new bugs without me checking the JIRA web page few times a day. And this is when JIRA search filters come in handy.

You can set up a filter such as this one by going to an advanced search and typing something similar to "project = your_project_name AND type = bug AND created >= -1h". This will find all bugs that were created in project name "your_project_name" in the last hour. Now you need to save your search through a Save as button and give it a name. And you have just created a filter!

If you go through a main menu Issues -> Manage filters you should be able to see your filter here. Through a subscription you might add a new subscription, select advanced and put "0 0 0/1 * * ? *" into the text field. This is telling JIRA to trigger sending an email every hour for this filter - so every hour you'll get a email with new bugs from your project. Enjoy!