8.3. Buffer Manager Locks

The buffer manager uses many locks for a variety of purposes. This section describes the locks that are necessary for the explanations in the subsequent sections.

Note

Note that the locks described in this section are part of a synchronization mechanism for the buffer manager. They do not relate to any SQL statements or SQL options.

8.3.1. Buffer Table Locks

BufMappingLock protects the data integrity of the entire buffer table. It is a light-weight lock that can be used in both shared and exclusive modes. When searching an entry in the buffer table, a backend process holds a shared BufMappingLock. When inserting or deleting entries, a backend process holds an exclusive lock.

The BufMappingLock is split into partitions to reduce contention in the buffer table (the default is 128 partitions). Each BufMappingLock partition guards a portion of the corresponding hash bucket slots.

Figure 8.7 shows a typical example of the effect of splitting BufMappingLock. Two backend processes can simultaneously hold respective BufMappingLock partitions in exclusive mode to insert new data entries. If BufMappingLock were a single system-wide lock, both processes would have to wait for the other process to finish, depending on which process started first.

Figure 8.7. Two processes simultaneously acquire the respective partitions of BufMappingLock in exclusive mode to insert new data entries.

The buffer table requires many other locks. For example, the buffer table internally uses a spin lock to delete an entry. However, descriptions of these other locks are omitted because they are not required in this document.

Info

Until PostgreSQL version 9.4, BufMappingLock was split into 16 separate locks by default.

8.3.2. Locks for Each Buffer Descriptor

In versions 9.5 or earlier, each buffer descriptor used two lightweight locks, content_lock and io_in_progress_lock, to control access to the stored page in the corresponding buffer pool slot. A spinlock (buf_hdr_lock) was used when the values of its own fields (i.e., usage_count, refcount, flags) were checked or changed.

In version 9.6, buffer access methods were improved. The io_in_progress_lock and spin lock (buf_hdr_lock) were removed. Instead of using these locks, versions 9.6 or later use CPU atomic operations to inspect and change their values.

Atomic Operations

An atomic operation is an operation that executes without interruption, ensuring it completes as a single, indivisible unit. These operations are commonly used for tasks such as updating counters without locks or implementing lock-free data structures.

8.3.2.1. content_lock

The content_lock is a typical lock that enforces access restrictions. It can be used in shared and exclusive modes.

When reading a page, a backend process acquires a shared content_lock of the buffer descriptor that stores the page.

An exclusive content_lock is acquired when doing one of the following:

  • Inserting rows (i.e., tuples) into the stored page or changing the t_xmin/t_xmax fields of tuples within the stored page. (t_xmin and t_xmax are described in Section 5.2; simply, when deleting or updating rows, these fields of the associated tuples are changed).

  • Physically removing tuples or compacting free space on the stored page. (This is performed by vacuum processing and HOT, which are described in Chapters 6 and 7, respectively).

  • Freezing tuples within the stored page. (Freezing is described in Section 5.10.1 and Section 6.3.

The official README file provides more details.

8.3.2.2. io_in_progress_lock (versions 9.5 or earlier)

In versions 9.5 or earlier, the io_in_progress lock was used to wait for I/O on a buffer to complete. When a PostgreSQL process loads or writes page data from or to storage, the process acquires an exclusive io_in_progress lock of the corresponding descriptor while accessing the storage.

8.3.2.3. spinlock (versions 9.5 or earlier)

When the flags or other fields (such as refcount and usage_count) are checked or changed, a spinlock was used. Two specific examples of spinlock usage are given below:

  • (1) Pinning a buffer descriptor:

    1. Acquire a spinlock of the buffer descriptor.
    2. Increase the values of its refcount and usage_count by 1.
    3. Release the spinlock.
      LockBufHdr(bufferdesc);    /* Acquire a spinlock */
      bufferdesc->refcont++;
      bufferdesc->usage_count++;
      UnlockBufHdr(bufferdesc); /* Release the spinlock */
  • (2) Setting the dirty bit to ‘1’:

    1. Acquire a spinlock of the buffer descriptor.
    2. Set the dirty bit to ‘1’ using a bitwise operation.
    3. Release the spinlock.
      #define BM_DIRTY             (1 << 0)    /* data needs writing */
      #define BM_VALID             (1 << 1)    /* data is valid */
      #define BM_TAG_VALID         (1 << 2)    /* tag is assigned */
      #define BM_IO_IN_PROGRESS    (1 << 3)    /* read or write in progress */
      #define BM_JUST_DIRTIED      (1 << 5)    /* dirtied since write started */
      
      LockBufHdr(bufferdesc);
      bufferdesc->flags |= BM_DIRTY;
      UnlockBufHdr(bufferdesc);

Changing other bits is performed in the same manner.