B.

Evaluating MySQL Parallel Replication Part 4, Annex: Under the Hood

This is the annex to Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production.

There is no introduction or conclusion to this post, only landing sections: reading this post without its context will probably be very hard. You should start with the main post and come back here for more details.

Implementation Details of MariaDB Optimistic Parallel Replication

Rollbacks and Retries

When transactions T1 to T20 are run concurrently by optimistic parallel replication, if T19 blocks T2, T19 will be killed (rolled-back) for unblocking T2 (T2 must commit before T19). In the current implementation, T19 will be retried once T18 completed. It looks like this could be optimized.

I thought that retrying T19 as early as T2 completes could improve optimistic replication speed. Kristian Nielsen, the implementer of parallel replication in MariaDB, was kind enough to implement a patch with more aggressive retries. However, with quicker retries, I got slower results than with delayed retries. So it looks like once a conflict is detected (T19 blocks T2), the probability of another conflict is high, and the gain in retrying T19 earlier is outweighed by the cost of other rollbacks of T19.

DML vs DDL and Non-Transactional Storage Engines

The assumption for optimistic parallel replication to work is that a transaction that causes a conflict can be killed and retried. This is the case for InnoDB DML (Data Manipulation Language: INSERT, UPDATE, DELETE, ...) but it is not the case with MyISAM.

As a transaction involving a MyISAM table (or another non-transactional storage engine) cannot be rolled-back, it is not safe to run those transactions optimistically. When such transaction enters the optimistic parallel replication pipeline, the replication applier will wait for all previous transactions to complete before starting the transaction that cannot be rolled-back. The following transactions could still be run optimistically if they are exclusively using a transactional storage engine (if they can be rolled-back). This means that DMLs that cannot be rolled-back act as a pre-barrier in the parallel replication pipeline.

In MariaDB, DDL (Data Definition Language: [CREATE | ALTER | TRUNCATE | DROP | ...] TABLE, ...) are also (still) impossible to rollback. So they will also act as a pre-barrier in the parallel replication pipeline. Moreover, DDL are also preventing all next transactions to be optimistically applied because a DML is not safe to run at the same time as a DDL on the same table. So, not only DDLs act as a pre-barrier, but they are also acting as a post-barrier.

Different Optimistic Parallel Replication Modes

MariaDB 10.1 optimistic parallel replication can be run in two slave_parallel_mode: optimistic and aggressive. In the optimistic mode, some heuristics are used to avoid needless conflicts. In the aggressive mode, those heuristics are disabled.

One of the heuristics of the optimistic mode is the following: if a transaction executed a row-lock wait on the master, it will not be run in parallel on the slave. The behavior is unclear when intermediate masters are used:

  • An intermediate master with slave_parallel_mode=none (single threaded) will not have any row-lock wait. So it looks like for a slave of such intermediate master, the optimistic mode would behave the same way as the aggressive mode.
  • An intermediate master with slave_parallel_mode=minimal (slave group committing) will have a row-lock wait for each group commit. So it looks like for a slave of such intermediate master, the optimistic mode would behave the same as the conservative mode.
  • An intermediate master with slave_parallel_mode=conservative should generate very few row-lock wait (only for conflicts that will generate retries). So it looks like for a slave of such intermediate master, the optimistic mode will behave mostly the same as the aggressive mode.
  • The number of row-lock waits is hard to predict on an intermediate master in optimistic or aggressive mode. So the behavior of a slave of such intermediate master is hard to predict.

As we are doing tests on a slave of an intermediate master, the optimistic mode is not very interesting to test. It would generate results similar to the aggressive mode if the intermediate master was running in single-threaded or conservative mode, or similar to the conservative mode if the intermediate master was running in minimal mode. Without a true master running MariaDB 10.1, the only tests that we think make sense are with slave_parallel_mode=aggressive.

This is a good opportunity to remind that intermediate masters are bad for parallel replication. As shown in a Part 1, intermediate master are doing a poor job at transmitting parallelism information from their master to their slaves. The solution presented in the previous post still applies: use Binlog Servers.

Environments

As in the previous posts (Part 1, Part 2 and Part 3), we are using the same four environments. Each environment is composed of five servers. For slave_parallel_mode=none and slave_parallel_mode=conservative, only four of the five servers are needed and are organized as below:

+---+     +---+     +---+     +---+
| A | --> | B | --> | C | --> | D |
+---+     +---+     +---+     +---+

The A to C servers are strictly the same as before. The D server has the same hardware specification as before but it is now running MariaDB 10.1.8 [1]. This means that the conservative results will use the same parallelism information (group commit) as for the tests from Part 3 (we are re-using the same binary logs as the previous tests).

For optimistic parallel replication to work, a MariaDB 10.1 slave must be connected to a MariaDB 10.1 master [2], hence the introduction of a fifth (E) server. For slave_parallel_mode=aggressive, D is replicating from E as shown below:

+---+     +---+     +---+     +---+     +---+
| A | --> | B | --> | C | --> | E | --> | D |
+---+     +---+     +---+     +---+     +---+

The hardware specifications of E are not important because it is only serving binary logs. It was built as a clone of D that was upgraded to MariaDB 10.1. Replication was then started from C with slave_parallel_mode=none. This way, we produced 10.1 binary logs so slave_parallel_mode=aggressive will work on D.

The full test methodology is the same as for the previous tests and can be found in Part 3. The server and database configurations are mostly the same as in the previous tests with the following modifications:

Property E1 E2 E3 E4
InnoDB Buffer Pool Size
71 GB
162 GB
71 (from 76) GB
57 (from 76) GB
InnoDB Log Size
64 (from 1) GB
32 (from 1) GB
16 (from 4) GB
16 (from 1) GB

The motivations for the above changes are the following:

  • The InnoDB Buffer Pool Size was reduced for E3 and E4 because we were missing RAM to increase slave_parallel_threads to the number we wanted to test (more threads need more available RAM).
  • The InnoDB Log Size was increased because checkpointing was a bottleneck during our tests [3].

Results

In the main post, speedup graphs are presented for each of the four environments. Here, the underlying data for those graphs is presented.

The SB, HD and ND notations are explained in the main post.

The first line of the table below shows the time taken for the single-threaded execution with slave_parallel_mode (SPM) set to none. Then, for slave_parallel_threads (SPT) values of 5, 10, 20 and 40, we have results with both non-optimistic (slave_parallel_mode=conservative) and optimistic (slave_parallel_mode=aggressive) executions. Then, for slave_parallel_threads values of 80, 160, 320, 640, 1280, 2560 and 5120, we have results only for optimistic executions. Note that we cannot have meaningful results for non-optimistic runs with slave_parallel_threads greater than 40 because the maximum group size on C was 35 (see Part 3 for more details).

The times presented below are in the format hours:minutes.seconds and they represent the delay needed to process 24-hours of transactions. The number in bold is the speedup achieved from the single-threaded run.

Execution Times and Speedups
E1 E2 E3 E4
SPT SPM SB-HD SB-ND SB-HD SB-ND SB-HD SB-HD
none
7:36.09
4:01.20
3:09.34
1:24.09
10:56.20
7:59.34
5
conservative
5:08.52
1.48
3:52.54
1.04
1:42.23
1.85
1:17.09
1.09
9:14.35
1.18
5:41.16
1.41
aggressive
4:56.02
1.54
3:40.33
1.09
1:39.51
1.90
1:14.41
1.13
9:16.49
1.18
6:32.45
1.22
10
conservative
4:29.24
1.69
3:37.36
1.11
1:27.04
2.18
1:12.57
1.15
8:49.29
1.24
5:25.06
1.48
aggressive
4:12.49
1.80
3:14.59
1.24
1:23.23
2.27
1:07.14
1.25
8:37.28
1.27
5:58.17
1.34
20
conservative
4:06.02
1.85
3:24.45
1.18
1:20.11
2.36
1:12.32
1.16
8:32.32
1.28
5:14.20
1.53
aggressive
3:33.46
2.13
2:51.09
1.41
1:10.53
2.67
0:58.00
1.45
8:06.49
1.35
5:19.40
1.50
40
conservative
4:01.18
1.89
3:21.11
1.20
1:18.11
2.42
1:11.19
1.18
8:26.01
1.30
5:09.45
1.55
aggressive
3:11.19
2.38
2:27.48
1.63
1:02.15
3.05
0:50.00
1.68
7:34.28
1.44
4:18.04
1.86
80
aggressive
2:55.15
2.60
2:11.27
1.84
0:57.23
3.30
0:43.48
1.92
7:11.48
1.52
3:20.43
2.39
160
aggressive
2:42.22
2.81
2:02.56
1.96
0:56.24
3.36
0:41.14
2.04
6:42.28
1.63
2:44.52
2.91
320
aggressive
2:41.08
2.83
1:57.40
2.05
0:59.44
3.17
0:43.33
1.93
6:14.19
1.75
2:22.48
3.36
640
aggressive
2:42.52
2.80
1:57.48
2.05
1:09.31
2.73
0:54.56
1.53
5:32.46
1.97
2:06.50
3.78
1280
aggressive
2:43.00
2.80
2:01.12
1.99
1:33.47
2.02
1:23.37
1.01
5:05.29
2.15
2:10.01
3.69
2560
aggressive
2:46.21
2.74
2:04.44
1.93
2:28.25
1.28
2:21.53
0.59
4:46.43
2.29
2:16.07
3.52
5120
aggressive
2:45.39
2.75
2:07.18
1.90
4:54.18
0.64
4:50.55
0.29
4:49.34
2.27
2:26.09
3.28

Graphs during Tests

If you spot something we might have missed in the graphs below, please post a comment. Those graphs include the number of commits per second, CPU stats, Read IOPS and percentage of Retried Transaction for all tests.

E1 SBHD Commits E1 SBHD CPU E1 SBHD RIOPS E1 SBHD RTP Graphs # 1a: E1 Stats - Slave with Binary Logs - High Durability

E1 SBND Commits E1 SBND CPU E1 SBND RIOPS E1 SBND RTP Graphs # 1b: E1 Stats - Slave with Binary Logs - Relaxed Durability

E2 SBHD Commits E2 SBHD CPU E2 SBHD RIOPS E2 SBHD RTP Graphs # 2a: E2 Stats - Slave with Binary Logs - High Durability

E2 SBND Commits E2 SBND CPU E2 SBND RIOPS E2 SBND RTP Graphs # 2b: E2 Stats - Slave with Binary Logs - Relaxed Durability

E3 SBHD Commits E3 SBHD CPU E3 SBHD RIOPS E3 SBHD RTP Graphs # 3a: E3 Stats - Slave with Binary Logs - High Durability

E4 SBHD Commits E4 SBHD CPU E4 SBHD RIOPS E4 SBHD RTP Graphs # 4a: E4 Stats - Slave with Binary Logs - High Durability

[1] At the time of the publication of this post, the latest release of MariaDB 10.1 is 10.1.17. Our tests were done with MariaDB 10.1.8 because they were run a long time ago (I am a little embarrassed to be that late in my blog post editing).

[2] In the implementation of optimistic parallel replication in MariaDB 10.1, the master is responsible for flagging DDL and non-transactional DML and to pass this information to slaves via the binary logs. This is why a MariaDB 10.1 master is needed to enable optimistic parallel replication on a slave. This also means that for optimistic parallel replication to work, master and slaves must have compatible storage engines for DML: if a DML is transactional on the master, it must be transactional on the slave. So a master using InnoDB and a slave using MyISAM will not work.

[3] Because the InnoDB Log Size was too small in our previous tests, those tests were run in non-optimal conditions. The results presented in this post should be considered more accurate.

comments powered by Disqus