10.2. How Point-in-Time Recovery Works

Figure 10.3 illustrates the basic concept of PITR.

In PITR mode, PostgreSQL replays the WAL data from the archive logs on the base backup, starting at the REDO point created by pf_backup_start and continuing up to the specified recovery point. This recovery point in PostgreSQL is referred to as the recovery target.

The process of PITR operates as follows:

Suppose a mistake occurred at 12:05 GMT on 1 January 2024. The database cluster should be removed, and a new one restored using the base backup created before that time.

To begin, configure the command for the restore_command parameter and set the time for the recovery_target_time parameter to the point of the error (in this case, 12:05 GMT) in the postgresql.conf file (or recovery.conf for versions 11 or earlier).

# Place archive logs under /mnt/server/archivedir directory.
restore_command = 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = "2024-1-1 12:05 GMT"

When PostgreSQL starts up, it enters into PITR mode if there is a ‘backup_label’ file and a ‘recovery.signal’ file (versions 11 or earlier, ‘recovery.conf’) in the database cluster.

recovery.conf / recovery.signal

The recovery.conf file was removed in version 12, and all recovery-related parameters are now written in postgresql.conf. For detailed information, refer to the official document.

In versions 12 and later, restoring a server from a base backup requires creating an empty file named recovery.signal in the database cluster directory.

$ touch /usr/local/pgsql/data/recovery.signal

The PITR (Point-in-Time Recovery) process is almost the same as the normal recovery process described in Section 9.8. The only two differences are:

Where are WAL segments/Archive logs read from?
- Normal recovery mode – from the pg_wal subdirectory (in versions 9.6 or earlier, pg_xlog subdirectory) under the base directory.
- PITR mode – from an archival directory set in the archive_command parameter.
Where is the checkpoint location read from?
- Normal recovery mode – from the pg_control file.
- PITR mode – from a backup_label file.

The outline of PITR process is as follows:

PostgreSQL reads the value of ‘CHECKPOINT LOCATION’ from the backup_label file using the internal function read_backup_label() to find the REDO point.
PostgreSQL reads some values of parameters from the postgresql.conf (versions 11 or earlier, recovery.conf), such as restore_command and recovery_target_time.
PostgreSQL starts replaying WAL data from the REDO point, which can be easily obtained from the value of ‘CHECKPOINT LOCATION’. The WAL data are read from archive logs that are copied from the archival area to a temporary area by executing the command written in the restore_command parameter. (The copied log files in the temporary area are removed after use.)
In this example, PostgreSQL reads and replays WAL data from the REDO point to the one before the timestamp ‘2024-1-1 12:05:00’ because the recovery_target_time parameter is set to this timestamp. If a recovery target is not set to the postgresql.conf (versions 11 or earlier, recovery.conf), PostgreSQL will replay until the end of the archiving logs.
When the recovery process completes, a timeline history file, such as ‘00000002.history’, is created in the pg_wal subdirectory (in versions 9.6 or earlier, pg_xlog subdirectory).
If archiving log feature is enabled, the same named file is also created in the archival directory. The contents and role of this file are described in the following sections.

The records of commit and abort actions contain the timestamp at which each action has done (XLOG data portion of both actions are defined in xl_xact_commit and xl_xact_abort respectively).

xl_xact_commit

typedef struct xl_xact_commit
{
        TimestampTz	xact_time;          /* time of commit */
        uint32          xinfo;              /* info flags */
        int            	nrels;              /* number of RelFileNodes */
        int            	nsubxacts;          /* number of subtransaction XIDs */
        int            	nmsgs;              /* number of shared inval msgs */
        Oid            	dbId;               /* MyDatabaseId */
        Oid            	tsId;               /* MyDatabaseTableSpace */
        /* Array of RelFileNode(s) to drop at commit */
        RelFileNode     xnodes[1];          /* VARIABLE LENGTH ARRAY */
        /* ARRAY OF COMMITTED SUBTRANSACTION XIDs FOLLOWS */
        /* ARRAY OF SHARED INVALIDATION MESSAGES FOLLOWS */
} xl_xact_commit;

xl_xact_abort

typedef struct xl_xact_abort
{
        TimestampTz     xact_time;          /* time of abort */
        int            	nrels;              /* number of RelFileNodes */
        int             nsubxacts;          /* number of subtransaction XIDs */
        /* Array of RelFileNode(s) to drop at abort */
        RelFileNode     xnodes[1];          /* VARIABLE LENGTH ARRAY */
        /* ARRAY OF ABORTED SUBTRANSACTION XIDs FOLLOWS */
} xl_xact_abort;

Therefore, if a target time is set to the recovery_target_time parameter, PostgreSQL may select whether to continue recovery or not, whenever it replays XLOG record of either commit or abort action. When XLOG record of each action is replayed, PostgreSQL compares the target time and each timestamp written in the record, and if the timestamp exceed the target time, PITR process will be finished.

Info

The function read_backup_label() is defined in src/backend/access/transam/xlog.c.

The structure xl_xact_commit and xl_xact_abort are defined in src/include/access/xact.h.

Why can we use common archiving tools to make a base backup?

The recovery process is a process of restoring a database cluster to a consistent state, even if the cluster is inconsistent. PITR is based on the recovery process, so it can recover a database cluster even if the base backup is a bunch of inconsistent files. This is why we can use common archiving tools without the need for a file system snapshot capability or a special tool.

10.2. How Point-in-Time Recovery Works

Figure 10.3. Basic concept of PITR.