B. dev blog

Careers at booking.com

Writing Advanced Daemons That Aren't Daemons

This is the second and final article in a set of two. The first covered a pluggable locking toolkit. Here, we'll explore more advanced patterns. At Booking.com, we use Perl and IPC::ConcurrencyLimit[3]. You may use another language and another locking API: the strategies described here should be transferable.

Writing daemons correctly and dealing with all maintenance and reliability requirements correctly isn't easy. In our primary development language, Perl, you additionally suffer from rather unpredictable handling of signals[1], both relating to interruption of system calls and non-interruption of pathologically slow internal operations (eg. slow regular expressions). In short, it can be very convenient to avoid having to deploy daemons or system services at all.

At Booking.com, we occasionally resort to a rather curious replacement for daemons in certain classes of services. The basic recipe is to run a cron job frequently and encode a notion of life time into the program so that it does not live forever and gets replaced by subsequent runs. Sounds odd? Consider the following: Slow memory leaks no longer kill your service (and machine) dead. You should still fix the leaks, but you can do so in your schedule instead of on the spot. Rolling out new code does not require explicit daemon restart. No need to even have shell access to the machines you roll out to. Nor do you have to encode knowledge of the service into your generic roll-out infrastructure[2]. Further down the road, we'll show that this strategy also allows for advanced patterns like scaling capacity with demand and distributing across multiple machines. The system is also somewhat resilient to crashes, but so is an init-based daemon. At the heart of the strategy is, once again, locking. The running process obtains a lock which it releases when it exits. The candidates attempt to get the lock at start-up, fail, and terminate. Here is the basic recipe in pseudo-code:

- Run once per minute (or once every couple).
  - Attempt to get run lock.
    - If failed to get run lock, then exit.
    - If succeeded,
      then execute main loop until life time reached,
      then exit.

Simple enough! Using the IPC::ConcurrencyLimit Perl module, we can implement this trivially:

use 5.14.2;
use warnings;
use IPC::ConcurrencyLimit;

use constant LIFE_TIME => 5*60; # 5 min process life time

my $limit = IPC::ConcurrencyLimit->new(
    type      => 'Flock',
    max_procs => 1,
    path      => '/var/run/myapp',
);

my $lock_id = $limit->get_lock;
if (not $lock_id) {
    # Other process running.
    exit(0);
}
else {
    my $end_time = time() + LIFE_TIME;

    while (1) {
        process_work_unit();
        last if time() >= $end_time;
    }

    exit(0);
}

In real code, you would obviously have logging facilities and pull the life time and lock path out of your configuration system. This simple setup works fine if it is acceptable for your service to be down for up to a minute at a time. This happens when the running process exits right after another minutely cron invocation. With the above code, we basically guarantee this to happen all the time since the cron invocation and the life time have the same basic unit (minutes). This is easy to improve on with some jitter. Replace the LIFE_TIME constant with this:

use constant LIFE_TIME => 5 * 60 + int(rand(60)); # 5-6 min life time

Of course, we can still run into the same gaps in process availability by chance. Since we started from the premise of a daemon, that is really rather unlikely to be acceptable, so there is room for some improvement. We can get rid of the problem down to the level of seconds by introducing the notion of a standby process that takes over from the recently deceased daemon:

- Run once per minute (or once every couple).
  - Attempt to get run lock.
    - If succeeded to get run lock,
      then execute main loop until life time reached,
      then exit.
    - If failed to get run lock, then
      - Attempt to get standby lock.
        - If failed, exit.
        - If succeeded, then attempt to get run lock in a loop
          with short retry interval.

In other words, a second process stays in memory to replace the main process on short notice. The cron iteration time simply needs to be lower than the main process life time and we will have virtually no appreciable gaps in availability[4]. Thankfully, this kind of logic is already available from IPC::ConcurrencyLimit out of the box:

use 5.14.2;
use warnings;
use IPC::ConcurrencyLimit;
use IPC::ConcurrencyLimit::WithStandby;

# 5-6 min process life time
use constant LIFE_TIME => 5 * 60 + int(rand(60)); # in seconds

# Attempt to get main lock every 1/2 second
use constant MAIN_LOCK_INTERVAL => 1/2; # in seconds

# Keep retrying for ~3x lifetime of the worker process
use constant MAIN_LOCK_RETRIES => 1 + 3*int(LIFE_TIME / MAIN_LOCK_INTERVAL);

my $limit = IPC::ConcurrencyLimit::WithStandby->new(
    type              => 'Flock',
    path              => '/var/run/myapp',
    max_procs         => 1,
    standby_path      => '/var/run/myapp/standby',
    standby_max_procs => 1,
    interval          => MAIN_LOCK_INTERVAL,
    retries           => MAIN_LOCK_RETRIES,
);

my $lock_id = $limit->get_lock;
if (not $lock_id) {
    # Other process running.
    exit(0);
}
else {
    my $end_time = time() + LIFE_TIME;

    while (1) {
        process_work_unit();
        last if time() >= $end_time;
    }

    exit(0);
}

Only the setup of the lock object has changed in this modified example. All the standby lock logic is hidden within the IPC::ConcurrencyLimit::WithStandby interface. We did have to configure it a bit more thoroughly, however. IPC::ConcurrencyLimit::WithStandby uses two locks internally -- one for the actual worker process and one for the standby process that can take over the main process' responsibilities on short notice. What "short notice" means is defined by the interval parameter: It re-attempts to get the lock in the specified interval and retries this "retries" times before giving up.

Don't be fooled to think that this means that we can still end up with gaps in the time-coverage beyond the length of one interval. When the main process is done, the standby process gets promoted and there's a new standby process spawned by cron. This is reliable if there are no early crashes and the following relations hold:

S > L > C                              (1)

where C is the time interval with which cron spawns new processes, L is the maximum life-time of the main worker, and S is the minimum wait time of the standby process. The relation above is easy to show if you consider that a standby process needs to stick around long enough that it replaces the main process when it exits (thus S > L) while cron only has to spawn new standby processes often enough to replenish the standby process before the main process might exit (thus L > C). You can increase S and decrease C to your heart's content to allow for contingency in the real world of rarely (but occasionally) crashing worker processes[5].

After this bit of consideration, let's look at another extension to our example. We can trivially support multiple main processes at the same time to extend capacity beyond one core. It only takes two small changes to our setup. The max_procs and standby_max_procs parameters can be increased to allow for more processes of this type to run simultaneously[6]. In order to satisfy relation (1) above, you will now have to increase the rate at which cron spawns new processes, too. Since that might not be possible, you can instead extend process life time on the standby- and main processes or opt to spawn multiple processes from cron per iteration. This yields a modified form of relation (1) to assert full capacity:

S/k > L/k > C                          (2)

where k is understood to be the number of concurrent processes required to guarantee full capacity. If the cron is set up to spawn many processes at a time, but at a lower rate (eg. spawn ten workers at a time, but do that only every ten minutes), then we still require relation (1) to hold true independently to guarantee that there is always a process available. This is easy to see in the extreme case: If you spawn a very large number of processes at a low rate, then they fill all available worker and standby process slots, the remainder just quits. The workers and standby processes then die off before the next cron that replenishes the pools. A multi-worker fork version of the previous example follows.

use 5.14.2;
use warnings;
use IPC::ConcurrencyLimit;
use IPC::ConcurrencyLimit::WithStandby;
use POSIX qw(ceil);
use Time::HiRes qw(sleep);

# external settings
use constant LIFE_TIME => 5*60; # base life time in seconds
use constant MAX_WORKERS => 16; # max concurrent workers
use constant CRON_RATE => 60; # cron spawns more every 60s

# This asserts relation (2), but not necessarily relation (1).
use constant NCHILDREN => 2 + ceil(2 * MAX_WORKERS * CRON_RATE / LIFE_TIME);

# Attempt to get main lock every 1/2 second
use constant MAIN_LOCK_INTERVAL => 1/2; # in seconds

# Set the standby life time: Keep retrying for ~3x lifetime of the worker process
use constant MAIN_LOCK_RETRIES => 1 + int(3 * LIFE_TIME / MAIN_LOCK_INTERVAL);

# Fork the decided number of workers
my $is_child;
for (1 .. NCHILDREN) {
    $is_child = fork_child();
    last if $is_child;
    sleep(0.2); # no need to have artificial contention on the locks
}
exit(0) if not $is_child; # all children daemonized

my $limit = IPC::ConcurrencyLimit::WithStandby->new(
    type                => 'Flock',
    path                => '/var/run/myapp',
    max_procs           => MAX_WORKERS,
    standby_path        => '/var/run/myapp/standby',
    standby_max_procs   => MAX_WORKERS,
    interval            => MAIN_LOCK_INTERVAL,
    retries             => MAIN_LOCK_RETRIES,
    process_name_change => 1,
);

my $lock_id = $limit->get_lock;
if (not $lock_id) {
    # Other process running.
    exit(0);
}
else {
    my $end_time = time() + LIFE_TIME*(1+rand(0.1));

    while (1) {
        process_work_unit();
        last if time() >= $end_time;
    }

    exit(0);
}

# mostly standard daemonization from the perlipc manpage
sub fork_child {
    use autodie;
    defined(my $pid = fork)
      or die "Can't fork: $!";
    return() if $pid;
    chdir '/';
    open STDIN, '/dev/null';
    open STDOUT, '>/dev/null';
    die "Can't start a new session: $!"
      if POSIX::setsid == -1;
    open STDERR, '>&STDOUT';
    return 1;
}

Thanks to the pluggable IPC::ConcurrencyLimit::Lock locking back-ends, you can use the exact same technique to scale your application across multiple machines. All it takes is swapping out the lock type for a non-local lock. Do note, however, that distributed locking comes at a a price. Some implementations only offer very coarse grained locking (eg. Apache ZooKeeper), some at very high cost (eg. the NFS locking back-end), and most of them have non-trivial complexity in edge cases. The new process_name_change option simply makes IPC::ConcurrencyLimit::WithStandby modify the process name of the standby processes to note that they are on standby. This helps when trying to tell apart stuck workers from workers on standby.

As a final example of how the method can be adapted for many needs, we'll visit dynamic scaling. This is to say, changing the processing capacity (ie. the number of processes) depending on the demand. All the ingredients were covered earlier. The only significant change is giving an active process some control over its own life time. Once it has passed the minimum life time required for full availability, it may choose to continue processing up to some maximum life time if there is high demand. By modifying the maximum number of running jobs to be higher than the expected number of running jobs given the cron spawn rate and the minimum life time, one obtains a system that will scale the number of workers up to what is required to satisfy demand.

In this article, we've explored several advanced techniques around the pattern of replacing true daemons with cron & lock based multi-processing. While some of the discussion went into some detail, the complexity of operating such a system is relatively low and depends on the exact requirements. The technique is easy to adapt to many situations and avoids many problems associated with writing daemons.

[1] For details, see previous article about Devel::TrackSIG.

[2] By the way, ours is driven by git-deploy.

[3] IPC::ConcurrencyLimit was covered in the first article.

[4] Remember the premise "replacing daemons in certain classes of services"? This is the main limitation. With this setup we still get holes in the time coverage of the order of a second or a fraction of a second, so it's not appropriate for low-latency, very-high-availability situations.

[5] Application of probability theory to quantify this left as an exercise to the reader, but in practice, it's okay to be generous with the life time of the standby process and choosing the life time of the main process to be much larger than the cron interval.

[6] Writing the application to allow for concurrency is, once again left as an exercise to the reader.

comments powered by Disqus