1.2. Physical Structure of Database Cluster

A database cluster is basically a single directory, referred to as base directory. It contains some subdirectories and many files. When you execute the initdb utility to initialize a new database cluster, a base directory will be created under the specified directory. The path of the base directory is usually set to the environment variable PGDATA.

Figure 1.2 shows an example of database cluster in PostgreSQL. A database is a subdirectory under the base subdirectory, and each of the tables and indexes is (at least) one file stored under the subdirectory of the database to which it belongs. There are several subdirectories containing particular data and configuration files.

While PostgreSQL supports tablespaces, the meaning of the term is different from other RDBMSs. A tablespace in PostgreSQL is a single directory that contains some data outside of the base directory.

Figure 1.2. An example of database cluster.

In the following subsections, the layout of a database cluster, databases, files associated with tables and indexes, and tablespaces in PostgreSQL are described.

1.2.1. Layout of a Database Cluster

The layout of database cluster has been described in the official document. Main files and subdirectories in a part of the document have been listed in Table 1.1:

table 1.1: Layout of files and subdirectories under the base directory (From the official document)
files description
PG_VERSION A file containing the major version number of PostgreSQL.
current_logfiles A file recording the log file(s) currently written to by the logging collector.
pg_hba.conf A file to control PostgreSQL's client authentication.
pg_ident.conf A file to control PostgreSQL's user name mapping.
postgresql.conf A file to set configuration parameters.
postgresql.auto.conf A file used for storing configuration parameters that are set in ALTER SYSTEM. (versions 9.4 or later)
postmaster.opts A file recording the command line options the server was last started with.
subdirectories description
base/ Subdirectory containing per-database subdirectories.
global/ Subdirectory containing cluster-wide tables, such as pg_database and pg_control.
pg_commit_ts/ Subdirectory containing transaction commit timestamp data. (versions 9.5 or later)
pg_clog/ (versions 9.6 or earlier) Subdirectory containing transaction commit state data. It is renamed to pg_xact in version 10.
pg_dynshmem/ Subdirectory containing files used by the dynamic shared memory subsystem. (versions 9.4 or later)
pg_logical/ Subdirectory containing status data for logical decoding. (versions 9.4 or later)
pg_multixact/ Subdirectory containing multitransaction status data. (used for shared row locks)
pg_notify/ Subdirectory containing LISTEN/NOTIFY status data.
pg_repslot/ Subdirectory containing replication slot data. Replication Slots will be described in Section 11.4. (versions 9.4 or later)
pg_serial/ Subdirectory containing information about committed serializable transactions. (versions 9.1 or later)
pg_snapshots/ Subdirectory containing exported snapshots. The PostgreSQL's function pg_export_snapshot creates a snapshot information file in this subdirectory. (versions 9.2 or later)
pg_stat/ Subdirectory containing permanent files for the statistics subsystem.
pg_stat_tmp/ Subdirectory containing temporary files for the statistics subsystem.
pg_subtrans/ Subdirectory containing subtransaction status data.
pg_tblspc/ Subdirectory containing symbolic links to tablespaces.
pg_twophase/ Subdirectory containing state files for prepared transactions.
pg_wal/ (versions 10 or later) Subdirectory containing WAL (Write Ahead Logging) segment files. It is renamed from pg_xlog in Version 10. WAL will be described in Chapter 9.
pg_xact/ (versions 10 or later) Subdirectory containing transaction commit state data. It is renamed from pg_clog in Version 10. CLOG will be described in Section 5.4.
pg_xlog/ (versions 9.6 or earlier) Subdirectory containing WAL (Write Ahead Logging) segment files. It is renamed to pg_wal in Version 10.

1.2.2. Layout of Databases

A database is a subdirectory under the base subdirectory. The database directory names are identical to the respective OIDs. For example, when the OID of the database ‘sampledb’ is 16384, its subdirectory name is 16384.

$ cd $PGDATA
$ ls -ld base/16384
drwx------  213 postgres postgres  7242  8 26 16:33 16384

1.2.3. Layout of Files Associated with Tables and Indexes

Each table or index whose size is less than 1GB is stored in a single file under the database directory to which it belongs. Tables and indexes are internally managed by individual OIDs, while their data files are managed by the variable, relfilenode. The relfilenode values of tables and indexes basically but not always match the respective OIDs, the details are described below.

For example, let’s show the OID and relfilenode of the table ‘sampletbl’:

sampledb=# SELECT relname, oid, relfilenode FROM pg_class WHERE relname = 'sampletbl';
  relname  |  oid  | relfilenode
-----------+-------+-------------
 sampletbl | 18740 |       18740
(1 row)

The oid and relfilenode values are equal in this case. Additionally, the data file path of the table ‘sampletbl’ is ‘base/16384/18740’.

$ cd $PGDATA
$ ls -la base/16384/18740
-rw------- 1 postgres postgres 8192 Apr 21 10:21 base/16384/18740

The relfilenode values of tables and indexes can be changed by issuing certain commands, such as TRUNCATE, REINDEX, CLUSTER. For example, if we truncate the table ‘sampletbl’, PostgreSQL will assign a new relfilenode (18812) to the table, removes the old data file (18740), and creates a new one (18812).

sampledb=# TRUNCATE sampletbl;
TRUNCATE TABLE

sampledb=# SELECT relname, oid, relfilenode FROM pg_class WHERE relname = 'sampletbl';
  relname  |  oid  | relfilenode
-----------+-------+-------------
 sampletbl | 18740 |       18812
(1 row)
Info

In versions 9.0 or later, the built-in function pg_relation_filepath() is useful as this function returns the file path name of the relation with the specified OID or name.

sampledb=# SELECT pg_relation_filepath('sampletbl');
 pg_relation_filepath
----------------------
 base/16384/18812
(1 row)

When the file size of tables and indexes exceeds 1GB, PostgreSQL creates a new file named like relfilenode.1 and uses it. If the new file is filled up, PostgreSQL will create another new file named like relfilenode.2 and so on.

$ cd $PGDATA
$ ls -la -h base/16384/19427*
-rw------- 1 postgres postgres 1.0G  Apr  21 11:16 data/base/16384/19427
-rw------- 1 postgres postgres  45M  Apr  21 11:20 data/base/16384/19427.1
Info

The maximum file size of tables and indexes can be changed using the configuration, option “–with-segsize” when building PostgreSQL.

By examining the database subdirectories, it becomes apparent that each table has two associated files, suffixed with ‘_fsm’ and ‘_vm’. These are referred to as the free space map and visibility map, respectively.

The free space map stores information about the free space capacity on each page within the table file, and the visibility map stores information about the visibility of each page within the table file. (More details can be found in Sections 5.3.4 and 6.2.)

Indexes only have individual free space maps and do not have visibility map.

A specific example is shown below:

$ cd $PGDATA
$ ls -la base/16384/18751*
-rw------- 1 postgres postgres  8192 Apr 21 10:21 base/16384/18751
-rw------- 1 postgres postgres 24576 Apr 21 10:18 base/16384/18751_fsm
-rw------- 1 postgres postgres  8192 Apr 21 10:18 base/16384/18751_vm

The free space map and visibility map may also be internally referred to as the forks of each relation. the free space map is the first fork of the table/index data file (the fork number is 1), the visibility map the second fork of the table’s data file (the fork number is 2). The fork number of the data file is 0.

1.2.4. Tablespaces

A tablespace in PostgreSQL is an additional data area outside the base directory. This functionality was implemented in version 8.0.

Figure 1.3 shows the internal layout of a tablespace and its relationship with the main data area.

Figure 1.3. A Tablespace in the Database Cluster.

A tablespace is created within the directory specified during the execution of the CREATE TABLESPACE statement. Under that directory, a version-specific subdirectory (e.g., PG_14_202011044) will be created. The naming convention for the version-specific subdirectory is shown below.

PG _ 'Major version' _ 'Catalogue version number'

For example, creating a tablespace named ’new_tblspc’ at ‘/home/postgres/tblspc’, with an OID of 16386, a subdirectory named ‘PG_14_202011044’ will be created under the tablespace.

$ ls -l /home/postgres/tblspc/
total 4
drwx------ 2 postgres postgres 4096 Apr 21 10:08 PG_14_202011044

The tablespace directory is addressed by a symbolic link from the pg_tblspc subdirectory. The link name is the same as the OID value of tablespace.

$ ls -l $PGDATA/pg_tblspc/
total 0
lrwxrwxrwx 1 postgres postgres 21 Apr 21 10:08 16386 -> /home/postgres/tblspc

When a new database (OID 16387) is created within the tablespace, its directory is placed under the version-specific subdirectory:

$ ls -l /home/postgres/tblspc/PG_14_202011044/
total 4
drwx------ 2 postgres postgres 4096 Apr 21 10:10 16387

If a new table is created within a database located in the base directory, a new directory is first created under the version-specific subdirectory, named according to the OID of the database. The new table file is then placed in this newly created directory.

sampledb=# CREATE TABLE newtbl (.....) TABLESPACE new_tblspc;

sampledb=# SELECT pg_relation_filepath('newtbl');
             pg_relation_filepath
---------------------------------------------
 pg_tblspc/16386/PG_14_202011044/16384/18894