MySQL Parallel Replication Pitfalls: Group Commit Sync Delay

Replication lag is one of the most common problems affecting asynchronous replication of MySQL databases. This can happen for several reasons, the most common two of which are:

  1. Insufficient disk I/O capacity on the slave
  2. Large transactions that take a long time to apply

These two have obvious solutions.

Point 1 has to be solved at either hardware or file system level (or, more dangerously, by relaxing flushing constraints at MySQL level by relaxing innodb_flush_log_at_trx_commit, which will result in having to re-initialise replicas in case of a crash or an unclean shutdown).

Point 2 requires a rethink of application design at requirements level to keep the transactions fast and short.

But in this article, we will focus on the third most common scenario caused by the performance constraint of running the replication process in a single thread. In this case, transactions are fast, and disk I/O is plentiful. Still, the single replication thread cannot apply the transactions fast enough to keep up with the transactions being flooded in by the master that can process them with high concurrency.

MySQL Parallel Replication

MySQL has a solution for this – parallel replication. To make the use of parallel replication effective, we have to ensure that there are enough non-conflicting transactions committed in the same commit group for the replication threads to process. To do this, however, we have to introduce a delay between when a transaction is committed and when it is written to the binary log for replication. If there is no delay, then every transaction will commit and written separately to the replication log. The delay allows multiple committed transactions if they didn’t touch the same rows, to be written as non-conflicting to the binary log as part of the same commit group.

If transactions touch the same rows, MySQL will finish writing the commit group and write the transaction that would conflict in the same commit group into the next commit group.

There are two settings used to configure the behaviour of writing transactions to replication logs into concurrently executable commit groups:

binlog_group_commit_sync_delay
binlog_group_commit_sync_no_delay_count

The transaction commit group will contain at most binlog_group_commit_sync_no_delay_count transactions. If this many accumulate before the delay specified by binlog_group_commit_sync_delay is reached, the transaction group will be committed to binlog, and the next transaction group buffering will begin.

The problem here is the delay introduced. It can be up to 1 second and is settable in microseconds. The problem is that we cannot just set it to 1,000,000 microseconds because this will delay transactions to 1 second. On a mostly idle system, we don’t want to introduce a 1-second delay just because we are processing fewer than one transaction per second during a quiet period.

Tuning MySQL Parallel Replication

So how should we tune this? First of all, it should be stressed that parallel replication should be considered only after the first two scenarios listed above have been assessed and rectified or determined not to be the root cause of replication lag.

We have to consider how many threads we want our replicas to use. As an absolute maximum, it should be, at most, the number of available CPU cores on the replica. A reasonable number is usually half this because many other threads are working, such as various I/O threads, replication log fetching and parsing threads, and threads serving user-facing workload served by the replica. This is configured on the replica using the slave_parallel_threads setting.

The slave_parallel_threads setting should be adjusted to the minimum number required for the replica to keep up with the peak daily workload.

In turn, binlog_group_commit_sync_no_delay_count should be set to the same value on the master.

binlog_group_commit_sync_delay should be set to the minimum required to allow for binlog_group_commit_sync_no_delay_count transactions to accumulate during peak daily throughput on the master.

Establishing and tuning these numbers requires a good monitoring tool such as Shattered Silicon Monitoring (SSM) or Percona Monitoring and Management (PMM). It should also be noted that as and when the volume and nature of the transactional workload changes, these values must be recalibrated over time.