4.1. Overview

To utilize the FDW feature, the appropriate extension must be installed, and setup commands such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING. (For details, refer to the official document.)

After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.

Figure 4.2 briefly describes how FDW performs in PostgreSQL.

Figure 4.2. How FDWs perform.
  1. The analyzer/analyser creates the query tree of the input SQL.
  2. The planner (or executor) connects to the remote server.
  3. If the use_remote_estimate option is ‘on’ (the default is ‘off’), the planner executes EXPLAIN commands to estimate the cost of each plan path.
  4. The planner creates a plain text SQL statement from the plan tree which is internally called deparesing.
  5. The executor sends the plain text SQL statement to the remote server and receives the result.

The executor then processes the received data if necessary. For example, if a multi-table query is executed, the executor performs the join processing of the received data and other tables.

The details of each processing are described in the following sections.

4.1.1. Creating a Query Tree

The analyzer/analyser creates the query tree of the input SQL using the definitions of the foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs using the CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA command.

4.1.2. Connecting to the Remote Server

To connect to the remote server, the planner (or executor) uses the appropriate library to connect to the remote database server. For example, to connect to the remote PostgreSQL server, the postgres_fdw uses the libpq library. To connect to the MySQL server, the mysql_fdw extension, which is developed by EnterpriseDB, uses the libmysqlclient library.

The connection parameters, such as username, server’s IP address and port number, are stored in the pg_catalog.pg_user_mapping and pg_catalog.pg_foreign_server catalogs using the CREATE USER MAPPING and CREATE SERVER commands.

4.1.3. Creating a Plan Tree Using EXPLAIN Commands (Optional)

PostgreSQL’s FDW supports the ability to obtain statistics of foreign tables to estimate the plan tree of a query. Some FDW extensions, such as postgres_fdw, mysql_fdw, tds_fdw, and jdbc2_fdw, use these statistics.

If the use_remote_estimate option is set to ‘on’ using the ALTER SERVER command, the planner queries the cost of plans to the remote server by executing the EXPLAIN command. Otherwise, the embedded constant values are used by default.

localdb=# ALTER SERVER remote_server_name OPTIONS (use_remote_estimate 'on');

Although some extensions use the values of the EXPLAIN command, only postgres_fdw can reflect the results of the EXPLAIN commands because PostgreSQL’s EXPLAIN command returns both the start-up and total costs.

The results of the EXPLAIN command cannot be used by other DBMS FDW extensions for planning. For example, mysql’s EXPLAIN command only returns the estimated number of rows. However, PostgreSQL’s planner needs more information to estimate the cost as described in Chapter 3.

4.1.4. Deparesing

To generate the plan tree, the planner creates a plain text SQL statement from the plan tree’s scan paths of the foreign tables. For example, Figure 4.3 shows the plan tree of the following SELECT statement:

localdb=# SELECT * FROM tbl_a AS a WHERE a.id < 10;

Figure 4.3 shows that the ForeignScan node, which is linked from the plan tree of the PlannedStmt, stores a plain SELECT text. Here, postgres_fdw recreates a plain SELECT text from the query tree that has been created by parsing and analysing, which is called deparsing in PostgreSQL.

Figure 4.3. Example of the plan tree that scans a foreign table.

The use of mysql_fdw recreates a SELECT text for MySQL from the query tree. The use of redis_fdw or rw_redis_fdw creates a SELECT command.

4.1.5. Sending SQL Statements and Receiving Result

After deparsing, the executor sends the deparsed SQL statements to the remote server and receives the result.

The method for sending the SQL statements to the remote server depends on the developer of each extension. For example, mysql_fdw sends the SQL statements without using a transaction. The typical sequence of SQL statements to execute a SELECT query in mysql_fdw is shown below (Figure 4.4).

Figure 4.4. Typical sequence of SQL statements to execute a SELECT query in mysql_fdw.
  • (5-1) Set the SQL_MODE to ‘ANSI_QUOTES’.
  • (5-2) Send a SELECT statement to the remote server.
  • (5-3) Receive the result from the remote server.
    Here, mysql_fdw converts the result to readable data by PostgreSQL.
    All FDW extensions implement the feature that converts the result to PostgreSQL readable data.
mysql> SELECT command_type,argument FROM mysql.general_log;
+--------------+-----------------------------------------------------------------------+
| command_type | argument                                                              |
+--------------+-----------------------------------------------------------------------+
... snip ...

| Query        | SET sql_mode='ANSI_QUOTES'                                            |
| Prepare      | SELECT `id`, `data` FROM `localdb`.`tbl_a` WHERE ((`id` < 10))        |
| Close stmt   |                                                                       |
+--------------+-----------------------------------------------------------------------+

In postgres_fdw, the sequence of SQL commands is more complicated. The typical sequence of SQL statements to execute a SELECT query in postgres_fdw is shown below (Figure 4.5).

Figure 4.5. Typical sequence of SQL statements to execute a SELECT query in postgres_fdw.
  • (5-1) Start the remote transaction.
    The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE.
  • (5-2)-(5-4) Declare a cursor.
    The SQL statement is basically executed as a cursor.
  • (5-5) Execute FETCH commands to obtain the result.
    By default, 100 rows are fetched by the FETCH command.
  • (5-6) Receive the result from the remote server.
  • (5-7) Close the cursor.
  • (5-8) Commit the remote transaction.
LOG:  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
LOG:  parse <unnamed>: DECLARE c1 CURSOR FOR SELECT id, data FROM public.tbl_a WHERE ((id < 10))
LOG:  bind <unnamed>: DECLARE c1 CURSOR FOR SELECT id, data FROM public.tbl_a WHERE ((id < 10))
LOG:  execute <unnamed>: DECLARE c1 CURSOR FOR SELECT id, data FROM public.tbl_a WHERE ((id < 10))
LOG:  statement: FETCH 100 FROM c1
LOG:  statement: CLOSE c1
LOG:  statement: COMMIT TRANSACTION
The default remote transaction isolation level in postgres_fdw.

The explanation for why the default remote transaction isolation level is REPEATABLE READ is provided in the official document.

The remote transaction uses the SERIALIZABLE isolation level when the local transaction has the SERIALIZABLE isolation level; otherwise it uses the REPEATABLE READ isolation level. This choice ensures that if a query performs multiple table scans on the remote server, it will get snapshot-consistent results for all the scans. A consequence is that successive queries within a single transaction will see the same data from the remote server, even if concurrent updates are occurring on the remote server due to other activities.