4.2. How the Postgres_fdw Extension Performs
The postgres_fdw extension is a special module that is officially maintained by the PostgreSQL Global Development Group. Its source code is included in the PostgreSQL source code tree.
postgres_fdw is gradually improved. Table 4.1 presents the release notes related to postgres_fdw from the official document.
Version | Description | 9.3 |
|
---|---|
9.6 |
|
10 |
|
11 |
|
12 |
|
14 |
|
15 |
|
16 |
|
17 |
|
Since the previous section describes how postgres_fdw processes a single-table query, the following subsection describes how it processes a multi-table query, sort operation, and aggregate functions.
This subsection focuses on the SELECT statement. However, postgres_fdw can also process other DML (INSERT, UPDATE, and DELETE) statements.
Note: PostgreSQL’s FDW does not detect deadlock
The postgres_fdw extension and the FDW feature do not support the distributed lock manager and the distributed deadlock detection feature. Therefore, a deadlock can easily be generated.
For example, if Client_A updates a local table ’tbl_local’ and a foreign table ’tbl_remote’, and Client_B updates ’tbl_remote’ and ’tbl_local’, then these two transactions are in deadlock, but PostgreSQL cannot detect it. Therefore, these transactions cannot be committed.
localdb=# -- Client A
localdb=# BEGIN;
BEGIN
localdb=# UPDATE tbl_local SET data = 0 WHERE id = 1;
UPDATE 1
localdb=# UPDATE tbl_remote SET data = 0 WHERE id = 1;
UPDATE 1
localdb=# -- Client B
localdb=# BEGIN;
BEGIN
localdb=# UPDATE tbl_remote SET data = 0 WHERE id = 1;
UPDATE 1
localdb=# UPDATE tbl_local SET data = 0 WHERE id = 1;
UPDATE 1
4.2.1. Multi-Table Query
To execute a multi-table query, postgres_fdw fetches each foreign table using a single-table SELECT statement and then join them on the local server.
In versions 9.5 or earlier, even if the foreign tables are stored in the same remote server, postgres_fdw fetches them individually and joins them.
In versions 9.6 or later, postgres_fdw has been improved and can execute the remote join operation on the remote server when the foreign tables are on the same server and the use_remote_estimate option is enabled.
The execution details are described as follows.
4.2.1.1. Versions 9.5 or earlier
Let us explore how PostgreSQL processes the following query that joins two foreign tables: ’tbl_a’ and ’tbl_b'.
localdb=# SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200;
The result of the EXPLAIN command of the query is shown below:
|
|
The result shows that the executor selects the merge join and is processed as the following steps:
- Line 8: The executor fetches the table ’tbl_a’ using the foreign table scan.
- Line 6: The executor sorts the fetched rows of ’tbl_a’ on the local server.
- Line 11: The executor fetches the table ’tbl_b’ using the foreign table scan.
- Line 9: The executor sorts the fetched rows of ’tbl_b’ on the local server.
- Line 4: The executor carries out a merge join operation on the local server.
The following describes how the executor fetches the rows (Figure 4.6).
-
(5-1) Start the remote transaction.
-
(5-2) Declare the cursor c1, the SELECT statement of which is shown below:
SELECT id,data FROM public.tbl_a WHERE (id < 200)
-
(5-3) Execute FETCH commands to obtain the result of the cursor 1.
-
(5-4) Declare the cursor c2, whose SELECT statement is shown below:
Note that the WHERE clause of the original two-table query is “tbl_a.id = tbl_b.id AND tbl_a.id < 200”. Therefore, a WHERE clause “tbl_b.id < 200” can be logically added to the SELECT statement as shown previously. However, postgres_fdw cannot perform this inference; therefore, the executor has to execute the SELECT statement without any WHERE clauses and has to fetch all rows of the foreign table ’tbl_b’.SELECT id,data FROM public.tbl_b
This process is inefficient because unnecessary rows must be read from the remote server via the network. Furthermore, the received rows must be sorted to execute the merge join. -
(5-5) Execute FETCH commands to obtain the result of the cursor 2.
-
(5-6) Close the cursor c1.
-
(5-7) Close the cursor c2.
-
(5-8) Commit the transaction.
After receiving the rows, the executor sorts both received rows of ’tbl_a’ and ’tbl_b’, and then executes a merge join operation with the sorted rows.
4.2.1.2. Versions 9.6 or later
If the use_remote_estimate option is ‘on’ (the default is ‘off’), postgres_fdw sends several EXPLAIN commands to obtain the costs of all plans related to the foreign tables.
To send the EXPLAIN commands, postgres_fdw sends both the EXPLAIN command of each single-table query and the EXPLAIN commands of the SELECT statements to execute remote join operations. In this example, the following seven EXPLAIN commands are sent to the remote server to obtain the estimated costs of each SELECT statement. The planner then selects the cheapest plan.
(1) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200))
(2) EXPLAIN SELECT id, data FROM public.tbl_b
(3) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
(4) EXPLAIN SELECT id, data FROM public.tbl_a WHERE ((((SELECT null::integer)::integer) = id)) AND ((id < 200))
(5) EXPLAIN SELECT id, data FROM public.tbl_b ORDER BY id ASC NULLS LAST
(6) EXPLAIN SELECT id, data FROM public.tbl_b WHERE ((((SELECT null::integer)::integer) = id))
(7) EXPLAIN SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200))))
Let us execute the EXPLAIN command on the local server to observe what plan is selected by the planner.
localdb=# EXPLAIN SELECT * FROM tbl_a AS a, tbl_b AS b WHERE a.id = b.id AND a.id < 200;
QUERY PLAN
-----------------------------------------------------------
Foreign Scan (cost=134.35..244.45 rows=80 width=16)
Relations: (public.tbl_a a) INNER JOIN (public.tbl_b b)
(2 rows)
The result shows that the planner selects the inner join query that is processed on the remote server, which is very efficient.
The following describes how postgres_fdw is performed (Figure 4.7).
- (3-1) Start the remote transaction.
- (3-2) Execute the EXPLAIN commands for estimating the cost of each plan path.
In this example, seven EXPLAIN commands are executed. Then, the planner selects the cheapest cost of the SELECT queries using the results of the executed EXPLAIN commands.
-
(5-1) Declare the cursor c1, whose SELECT statement is shown below:
SELECT r1.id, r1.data, r2.id, r2.data FROM (public.tbl_a r1 INNER JOIN public.tbl_b r2 ON (((r1.id = r2.id)) AND ((r1.id < 200))))
-
(5-2) Receive the result from the remote server.
-
(5-3) Close the cursor c1.
-
(5-4) Commit the transaction.
Note that if the use_remote_estimate option is off (by default), a remote-join query is rarely selected because the costs are estimated using a very large embedded value.
4.2.2. Sort Operations
4.2.2.1. Versions 9.5 or earlier
The sort operation, such as ORDER BY, is processed on the local server. This means that the local server fetches all the target rows from the remote server before the sort operation. Let us explore how a simple query that includes an ORDER BY clause is processed using the EXPLAIN command.
|
|
-
Line 6: The executor sends the following query to the remote server, and then fetches the query result.
SELECT id, data FROM public.tbl_a WHERE ((id < 200))
-
Line 4: The executor sorts the fetched rows of ’tbl_a’ on the local server.
4.2.2.2. Versions 9.6 or later
The postgres_fdw can execute the SELECT statements with an ORDER BY clause on the remote server if possible.
|
|
- Line 4: The executor sends the following query that has an ORDER BY clause to the remote server, and then fetches the query result, which is already sorted.
SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
This improvement has reduced the workload of the local server.
4.2.3. Aggregate Functions
4.2.3.1. Versions 9.6 or earlier
Similar to the sort operation mentioned in the previous subsection, aggregate functions such as AVG() and COUNT() are processed on the local server as the following steps:
|
|
-
Line 5: The executor sends the following query to the remote server, and then fetches the query result.
SELECT id, data FROM public.tbl_a WHERE ((id < 200))
-
Line 4: The executor computes the average of the fetched rows of ’tbl_a’ on the local server.
This process is costly because sending a large number of rows consumes heavy network traffic and takes a long time.
4.2.3.2. Versions 10 or later
The postgres_fdw executes the SELECT statement with the aggregate function on the remote server if possible.
|
|
- Line 4: The executor sends the following query that contains an AVG() function to the remote server, and then fetches the query result.
SELECT avg(data) FROM public.tbl_a WHERE ((id < 200))
This process is obviously efficient because the remote server calculates the average and sends only one row as the result.
Similar to the given example, push-down is an operation where the local server allows the remote server to process some operations, such as aggregate procedures.