11.3. Managing Multiple-Standby Servers

This section describes how streaming replication works with multiple standby servers.

11.3.1. sync_priority and sync_state

The primary server assigns the sync_priority and sync_state attributes to all managed standby servers, and treats each standby server according to its respective values. (The primary server assigns these values even if it manages just one standby server; this was not mentioned in the previous section.)

The sync_priority attribute indicates the priority of the standby server in synchronous mode. The lower the value, the higher the priority. The special value 0 means that the standby server is ‘in asynchronous mode’. The priorities of the standby servers are assigned in the order listed in the primary server’s configuration parameter synchronous_standby_names.

For example, in the following configuration, the priorities of standby1 and standby2 are 1 and 2, respectively.

synchronous_standby_names = 'standby1, standby2'

(Standby servers that are not listed in this parameter are in asynchronous mode and have a priority of 0.)

sync_state is the state of the standby server. The sync_state attribute indicates the state of the standby server. It can be one of the following values:

  • sync: The standby server is in synchronous mode and is the highest priority standby server that is currently working.

  • potential: The standby server is in synchronous mode and is a lower priority standby server that is currently working. If the current sync standby server fails, this standby server will be promoted to sync state.

  • async: The standby server is in asynchronous mode. (It will never be in ‘sync’ or ‘potential’ mode.)

  • quorum: The standby servers are in quorum mode. See Section 11.3.2.1 for detail.

The priority and state of the standby servers can be shown by issuing the following query:

testdb=# SELECT application_name AS host,
         sync_priority, sync_state FROM pg_stat_replication;
   host   | sync_priority | sync_state
----------+---------------+------------
 standby1 |             1 | sync
 standby2 |             2 | potential
(2 rows)

11.3.2. How the Primary Manages Multiple-standbys

The primary server waits for ACK responses from the synchronous standby server alone. In other words, the primary server confirms only the synchronous standby’s writing and flushing of WAL data. Streaming replication, therefore, ensures that only the synchronous standby is in a consistent and synchronous state with the primary.

Figure 11.4 shows the case in which the ACK response of the potential standby has been returned earlier than that of the primary standby. In this case, the primary server does not complete the commit action of the current transaction and continues to wait for the primary’s ACK response. When the primary’s response is received, the backend process releases the latch and completes the current transaction processing.

Figure 11.4. Managing multiple standby servers.

The sync_state of standby1 and standby2 are ‘sync’ and ‘potential’, respectively.

  1. The primary’s backend process continues to wait for an ACK response from the synchronous standby server, even though it has received an ACK response from the potential standby server.
  2. After receiving the ACK response from the synchronous standby server, the primary’s backend process releases the latch and completes the current transaction processing.

In the opposite case (i.e., the primary’s ACK response has been returned earlier than the potential’s), the primary server immediately completes the commit action of the current transaction without ensuring if the potential standby writes and flushes WAL data or not.

11.3.2.1. Quorum-Based Synchronous Replication

In version 9.6, quorum-based synchronous replication was introduced. This feature allows transactions in PostgreSQL to be considered committed once a subset (a quorum) of synchronous standby servers acknowledges them. It enhances flexibility in RDBMS configuration design and administration.

Quorum-based synchronous replication has two modes: ANY and FIRST.

ANY Mode

The format of synchronous_standby_names for ANY mode is:

ANY num_sync ( standby_name [, ...] )

In ANY mode, the primary server completes the commit action of the current transaction any ’num_sync’ standby servers in the list return ACK responses.

For example, the following setting allows the primary server to complete the commit action of the current transaction as soon as any two standby servers return ACK responses, even if the rest of the standby servers do not return responses.

synchronous_standby_names = 'ANY 2 (standby1, standby2, standby3)'

Figure 11.5 illustrates the behavior of the ANY Mode setting:

Figure 11.5. Behavior of the ANY Mode Setting.

The following is the state of sync_priority and sync_state for the above setting:

testdb=# SELECT application_name AS host,
        sync_priority, sync_state FROM pg_stat_replication;
   host   | sync_priority | sync_state
----------+---------------+------------
 standby1 |             1 | quorum
 standby2 |             1 | quorum
 standby3 |             1 | quorum
(3 rows)
FIRST Mode

The format of synchronous_standby_names for FIRST mode is:

[FIRST] num_sync ( standby_name [, ...] )

In FIRST mode, the primary server completes the commit action of the current transaction after the first ’num_sync’ standby servers in the list return ACK responses.

For example, the following setting allows the primary server to wait for the commit action of the current transaction until the first two standby servers, i.e., standby1 and standby2, return ACK responses, even if standby3 has already returned ACK response:

synchronous_standby_names = 'FIRST 2 (standby1, standby2, standby3)'

Figure 11.6 illustrates the behavior of the FIRST Mode setting:

Figure 11.6. Behavior of the FIRST Mode Setting.

The following is the state of sync_priority and sync_state for the above setting:

SELECT application_name AS host,
        sync_priority, sync_state FROM pg_stat_replication;
   host   | sync_priority | sync_state
----------+---------------+------------
 standby3 |             3 | potential
 standby2 |             2 | sync
 standby1 |             1 | sync
(3 rows)

11.3.3. Behavior When a Failure Occurs

Once again, let’s see how the primary server behaves when a standby server has failed.

When either a potential or an asynchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby and continues all processing. In other words, transaction processing on the primary server would not be affected by the failure of either type of standby server.

When a synchronous standby server has failed, the primary server terminates the walsender process connected to the failed standby, and replaces the synchronous standby with the highest priority potential standby. See Figure 11.7. In contrast to the failure described above, query processing on the primary server will be paused from the point of failure to the replacement of the synchronous standby. (Therefore, failure detection of the standby server is a very important function to increase the availability of the replication system. Failure detection will be described in the next section.)

Figure 11.7. Replacing of synchronous standby server.

In any case, if one or more standby servers are running in synchronous mode, the primary server keeps only one synchronous standby server at all times, and the synchronous standby server is always in a consistent and synchronous state with the primary.