4.1. Overview
To utilize the FDW feature, the appropriate extension must be installed, and setup commands such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING. (For details, refer to the official document.)
After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.
Figure 4.2 briefly describes how FDW performs in PostgreSQL.
- The analyzer/analyser creates the query tree of the input SQL.
- The planner (or executor) connects to the remote server.
- If the use_remote_estimate option is ‘on’ (the default is ‘off’), the planner executes EXPLAIN commands to estimate the cost of each plan path.
- The planner creates a plain text SQL statement from the plan tree which is internally called deparesing.
- The executor sends the plain text SQL statement to the remote server and receives the result.
The executor then processes the received data if necessary. For example, if a multi-table query is executed, the executor performs the join processing of the received data and other tables.
The details of each processing are described in the following sections.
4.1.1. Creating a Query Tree
The analyzer/analyser creates the query tree of the input SQL using the definitions of the foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs using the CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA command.
4.1.2. Connecting to the Remote Server
To connect to the remote server, the planner (or executor) uses the appropriate library to connect to the remote database server. For example, to connect to the remote PostgreSQL server, the postgres_fdw uses the libpq library. To connect to the MySQL server, the mysql_fdw extension, which is developed by EnterpriseDB, uses the libmysqlclient library.
The connection parameters, such as username, server’s IP address and port number, are stored in the pg_catalog.pg_user_mapping and pg_catalog.pg_foreign_server catalogs using the CREATE USER MAPPING and CREATE SERVER commands.
4.1.3. Creating a Plan Tree Using EXPLAIN Commands (Optional)
PostgreSQL’s FDW supports the ability to obtain statistics of foreign tables to estimate the plan tree of a query. Some FDW extensions, such as postgres_fdw, mysql_fdw, tds_fdw, and jdbc2_fdw, use these statistics.
If the use_remote_estimate option is set to ‘on’ using the ALTER SERVER command, the planner queries the cost of plans to the remote server by executing the EXPLAIN command. Otherwise, the embedded constant values are used by default.
localdb=# ALTER SERVER remote_server_name OPTIONS (use_remote_estimate 'on');
Although some extensions use the values of the EXPLAIN command, only postgres_fdw can reflect the results of the EXPLAIN commands because PostgreSQL’s EXPLAIN command returns both the start-up and total costs.
The results of the EXPLAIN command cannot be used by other DBMS FDW extensions for planning. For example, mysql’s EXPLAIN command only returns the estimated number of rows. However, PostgreSQL’s planner needs more information to estimate the cost as described in Chapter 3.
4.1.4. Deparesing
To generate the plan tree, the planner creates a plain text SQL statement from the plan tree’s scan paths of the foreign tables. For example, Figure 4.3 shows the plan tree of the following SELECT statement:
localdb=# SELECT * FROM tbl_a AS a WHERE a.id < 10;
Figure 4.3 shows that the ForeignScan node, which is linked from the plan tree of the PlannedStmt, stores a plain SELECT text. Here, postgres_fdw recreates a plain SELECT text from the query tree that has been created by parsing and analysing, which is called deparsing in PostgreSQL.
The use of mysql_fdw recreates a SELECT text for MySQL from the query tree. The use of redis_fdw or rw_redis_fdw creates a SELECT command.
4.1.5. Sending SQL Statements and Receiving Result
After deparsing, the executor sends the deparsed SQL statements to the remote server and receives the result.
The method for sending the SQL statements to the remote server depends on the developer of each extension. For example, mysql_fdw sends the SQL statements without using a transaction. The typical sequence of SQL statements to execute a SELECT query in mysql_fdw is shown below (Figure 4.4).
- (5-1) Set the SQL_MODE to ‘ANSI_QUOTES’.
- (5-2) Send a SELECT statement to the remote server.
- (5-3) Receive the result from the remote server.
Here, mysql_fdw converts the result to readable data by PostgreSQL.
All FDW extensions implement the feature that converts the result to PostgreSQL readable data.
In postgres_fdw, the sequence of SQL commands is more complicated. The typical sequence of SQL statements to execute a SELECT query in postgres_fdw is shown below (Figure 4.5).
- (5-1) Start the remote transaction.
The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE. - (5-2)-(5-4) Declare a cursor.
The SQL statement is basically executed as a cursor. - (5-5) Execute FETCH commands to obtain the result.
By default, 100 rows are fetched by the FETCH command. - (5-6) Receive the result from the remote server.
- (5-7) Close the cursor.
- (5-8) Commit the remote transaction.
The explanation for why the default remote transaction isolation level is REPEATABLE READ is provided in the official document.
The remote transaction uses the SERIALIZABLE isolation level when the local transaction has the SERIALIZABLE isolation level; otherwise it uses the REPEATABLE READ isolation level. This choice ensures that if a query performs multiple table scans on the remote server, it will get snapshot-consistent results for all the scans. A consequence is that successive queries within a single transaction will see the same data from the remote server, even if concurrent updates are occurring on the remote server due to other activities.