6.1. Outline of Concurrent VACUUM

Vacuum processing performs the following tasks for specified tables or all tables in the database:

Removing dead tuples
- Remove dead tuples and defragment live tuples for each page.
- Remove index tuples that point to dead tuples.
Freezing old txids
- Freeze old txids of tuples if necessary.
- Update frozen txid related system catalogs (pg_database and pg_class).
- Remove unnecessary parts of the clog if possible.
Others
- Update the FSM and VM of processed tables.
- Update several statistics (pg_stat_all_tables, etc).

It is assumed that readers are familiar with following terms: dead tuples, freezing txid, FSM, and the clog; if you are not, refer to Chapter 5. VM is introduced in Section 6.2.

The following pseudocode describes vacuum processing.

Pseudocode: Concurrent VACUUM

       // Phase 1: initializing

(1)    FOR each table
(2)      Acquire a ShareUpdateExclusiveLock lock for the target table

         /* The first block */

         // Phase 2: Scan Heap
(3)      Scan all pages to get all dead tuples, and freeze old tuples if necessary
         // Phase 3: Vacuuming Indexes
(4)      Remove the index tuples that point to the respective dead tuples if exists

         /* The second block */

         // Phase 4: Vacuuming Heap
(5)      FOR each page of the table
(6)         Remove the dead tuples, and Reallocate the live tuples in the page
(7)         Update FSM and VM
         END FOR

         /* The third block */

         // Phase 5: Cleaning up indexes
(8)      Clean up indexes
         // Phase 6: Truncating heap
(9)      Truncate the last page if possible
(10)     Update both the statistics and system catalogs of the target table

         Release the ShareUpdateExclusiveLock lock
      END FOR

      /* Post-processing */

      // Phase 7: Final Cleaning
(11)  Update statistics and system catalogs
(12)  Remove both unnecessary files and pages of the clog if possible

Get each table from the specified tables.
Acquire a ShareUpdateExclusiveLock lock for the table. This lock allows reading from other transactions.
Scan all pages to get all dead tuples, and freeze old tuples if necessary.
Remove the index tuples that point to the respective dead tuples if exists.
Do the following tasks, step (6) and (7), for each page of the table.
Remove the dead tuples and Reallocate the live tuples in the page.
Update both the respective FSM and VM of the target table.
Clean up the indexes by the index_vacuum_cleanup() function.
Truncate the last page if the last one does not have any tuple.
Update both the statistics and the system catalogs related to vacuum processing for the target table.
Update both the statistics and the system catalogs related to vacuum processing.
Remove both unnecessary files and pages of the clog if possible.

PostgreSQL internally divides the vacuum process into seven distinct phases. However, to make the overall vacuum process easier to understand, this document explains it using 3+1 simplified blocks introduced for clarity.

The accompanying pseudocode illustrates how these seven phases correspond to the 3+1 blocks.

In the following, these blocks are outlined.

PARALLEL option

The VACUUM command has supported the PARALLEL option since version 13. If this option is set and there are multiple indexes created, the vacuuming index and cleaning index up phases are processed in parallel.

Note that this feature is only valid for the VACUUM command and is not supported by autovacuum.

Phases of Vacuum Processing

The current phase of an active vacuum process can be identified by examining the phase column in the pg_stat_progress_vacuum view.

testdb=# SELECT datname, relid, phase FROM pg_stat_progress_vacuum;
 datname | relid |     phase
---------+-------+---------------
 testdb  | 16415 | scanning heap
(1 row)

6.1.1. First Block

This block performs freeze processing and removes index tuples that point to dead tuples.

First, PostgreSQL scans a target table to build a list of dead tuples and freeze old tuples if possible. The list is stored in the local memory called maintenance_work_mem. Freeze processing is described in Section 6.3.

After scanning, PostgreSQL removes index tuples by referring to the dead tuple list. Figure 6.1 shows an example of removing an index tuple that points to a dead tuple.

If maintenance_work_mem is full and scanning is incomplete, PostgreSQL proceeds to the next tasks, i.e. steps (4) to (7). Then, it goes back to step (3) and proceeds remainder scanning.

6.1.2. Second Block

This block removes dead tuples and updates both the FSM and VM on a page-by-page basis. Figure 6.2 shows an example:

Assume that the table contains three pages. We focus on the 0th page (i.e., the first page). This page has three tuples. Tuple_2 is a dead tuple (Figure 6.1(1)). In this case, PostgreSQL removes Tuple 2 and reorders the remaining tuples to repair fragmentation. Then, it updates both the FSM and VM of this page (Figure 6.1(2)). PostgreSQL continues this process until the last page.

Note that unnecessary line pointers are not removed. They will be reused in the future. This is because if line pointers are removed, all index tuples of the associated indexes must be updated.

6.1.3. Third Block

The third block performs the cleanup after the deletion of the indexes, and also updates the statistics and system catalogs related to vacuum processing for each target table.

Additionally, if the last pages contains no tuples, they are truncated from the table file.

To make this easier to understand, Figure 6.3 shows a slightly exaggerated example. In the figure, there is no tuples on the 1st, 3rd, and 4th pages by vacuuming heap.

During the truncating heap, the 4th and 3rd pages are removed from the table file, reducing its size by 16KB (= 8KB $\times$ 2 pages).

Although the 1st page also contains no tuples, it is not removed¹.

6.1.4. Post-processing

When vacuum processing is complete, PostgreSQL updates all the statistics and system catalogs related to vacuum processing. It also removes unnecessary parts of the clog if possible (Section 6.4).

Ring Buffer

Vacuum processing uses a ring buffer, described in Section 8.5. Therefore, processed pages are not cached in the shared buffers.

To remove such pages, use the VACUUM FULL command, as explained in Section 6.6. ↩︎