B. dev blog

Careers at booking.com

Highlander!

There can be only one! Referring to the one process doing a particular job at a time, of course.

When writing cron jobs that perform particular tasks in regular intervals, assumptions like "the run time will always be much shorter than the run interval" can be very dangerous in an environment that is scaling up in size rapidly. What runs ten minutes out of an hour now can easily take longer to run than an hour a few months from now if your business scales up quickly. Such assumptions can lead to anything from deadlocks to servers dying from load and data corruption.

Thankfully, there's an almost universal answer to this problem: Locking! Alas, locking in itself can be a very hard problem. For the above use case, a simple flock()-based solution commonly suffices. At Booking.com, we wrote and use the Perl module IPC::ConcurrencyLimit. By default, it uses a simple, machine-local flock() locking back-end:

use 5.14.2;
use warnings;
use IPC::ConcurrencyLimit;

run();
exit(0);

sub run {
    my $limit = IPC::ConcurrencyLimit->new(
        max_procs => 1,
        path      => '/var/run/myapp',
    );

    my $id = $limit->get_lock;
    if (not $id) {
        warn "Another process appears to be still running. Exiting.";
        exit(0);
    }
    else {
        do_work();
    }

    # lock released with $limit going out of scope here
}

The simple example assumes that the /var/run/myapp directory is writable by the current user. It shows the basic usage for a case such as the above, assuming that warnings are appropriately logged.

Other situations require a different setup. For example, the max_procs parameter could be set much higher for allowing parallel execution, but setting a limit on the concurrency to avoid entirely overwhelming the system. On the other hand, a distributed system might require that a particular task is only performed once at a time -- globally across many machines. In this case, the machine-local flock() locking back-end is not sufficient. Luckily, the locking back-ends are pluggable. CPAN sports lock implementations that support locking via NFS shares, locking via MySQL GET_LOCK() or locking via an HA pair of Redis servers. An experimental implementation for using Apache ZooKeeper for locking can be found on Github. When choosing a locking strategy, keep in mind that in locking problems, there is no silver bullet and there will always be trade-offs involved.

The MySQL locking back-end can be used for cross-machine locking as follows:

use 5.14.2;
use DBI;
use DBD::mysql;
use IPC::ConcurrencyLimit;
use IPC::ConcurrencyLimit::Lock::MySQL;

my $limit = IPC::ConcurrencyLimit->new(
    type      => 'MySQL',
    max_procs => 1,
    timeout   => 2,
    make_new_dbh => sub {
        DBI->connect("DBI:mysql:database=$database;...")
    },
);

# as before:
my $id = $limit->get_lock;
if (not $id) {
    # fail
} else {
    # success, do work
}

We used the type parameter to switch the locking strategy, specified that we want only one process of the given type to run anywhere using max_procs, and set the lock-acquisition time-out to two seconds (the time-out is not specific to the locking back-end). Finally, we provide the MySQL client with a way to create connections for the locks. Only a single GET_LOCK() lock can be held by any MySQL connection, so it may be important to separate this connection from other uses of the database[1]. MySQL's GET_LOCK() semantics are not very friendly.

The API for lock implementations is really rather simple. If your favourite strategy is not yet available as a back-end, then consider implementing it for others to use!

Next week, we'll consider advanced use cases of this tool which focus on implementing daemon-like functionality without all the drawbacks.

[1] Yes, this may also qualify as abuse.

comments powered by Disqus