In this week end, I tested PostgresPro’s pg_arman with ptrack. Speaking from the conclusion, it works well.
Instalation
PostgresPro’s PostgreSQL
Download the sorce code from here, and install on your PC.
$ cd /usr/local/src
$ unzip postgrespro-PGPRO9_5_ptrack.zip
$ cd postgrespro-PGPRO9_5_ptrack
$ ./configure --prefix=/usr/local/pgsql-pro
$ make && make install
Create an archival directory:
$ cd /usr/local/pgsql-pro
$ mkdir archive_logs
Set some parameters to the postgresql.conf:
wal_level = archive
archive_mode = on
archive_command = 'cp %p /usr/local/pgsql-pro/archive_logs/%f'
wal_log_hints = on
max_wal_senders = 1
ptrack_enable = on
In PostgresPro’s document, archive_mode and max_wal_senders are not denoted but they are necessary.
Put two lines below to the pg_ha.conf:
local replication postgres trust
host replication postgres 127.0.0.1/32 trust
PostgresPro’s pg_arman
Download the sorce code from here, and install it.
$ cd /usr/local/src/postgrespro-PGPRO9_5_ptrack/contrib
$ unzip pg_arman-ptrack.zip
$ cd pg_arman-ptrack
Edit Makefile to delete docs
.
diff pg_arman-ptrack.org/Makefile pg_arman-ptrack/Makefile
32c32
< all: checksrcdir docs datapagemap.h receivelog.h streamutil.h pg_arman
---
> all: checksrcdir datapagemap.h receivelog.h streamutil.h pg_arman
$ make top_srcdir=/usr/local/src/postgrespro-PGPRO9_5_ptrack
$ make install
Create a backup directory and pg_arman.ini file:
$ cd /usr/local/pgsql-pro
$ ./bin/pg_arman init -B arman_backup
$ echo "ARCLOG_PATH='/usr/local/pgsql-pro/archive_logs'" > arman_backup/pg_arman.ini2
Test
Let’s start postgres server, and do tests.
$ cd /usr/local/pgsql-pro
$ ./bin/initdb -D data
$ ./bin/pg_ctl -D data start
$ ./bin/createdb testdb
$ ./bin/psql testdb
Checking the influence of basic operations using small tables
At first, I tested the influence of basic operations: modification of both normal and TOAST tables, VACUUM and VACUUM FULL, and TRUNCATE.
Created tables and inserted tuples to them:
psql (9.5.3)
Type "help" for help.
testdb=# -- simple table
testdb=# CREATE TABLE simple_tbl (id int);
CREATE TABLE
testdb=# INSERT INTO simple_tbl (id) SELECT generate_series(1,1000);
INSERT 0 1000
testdb=# -- TOAST table
testdb=# CREATE TABLE toast_tbl (id int, data text);
CREATE TABLE
testdb=# INSERT INTO toast_tbl (id, data)
testdb-# SELECT i, string_agg(str,'') FROM
testdb-# (SELECT chr(65) as str, i
testdb(# FROM generate_series(1,3000) length, generate_series(1,100) num(i) )t
testdb-# GROUP BY i ORDER BY i;
INSERT 0 100
testdb=# -- VACUUM and VACUUM FULL test tables
testdb=# CREATE TABLE vacuum_tbl (id int);
CREATE TABLE
testdb=# CREATE TABLE vacuum_full_tbl (id int);
CREATE TABLE
testdb=# INSERT INTO vacuum_tbl (id) SELECT generate_series(1,100000);
INSERT 0 100000
testdb=# INSERT INTO vacuum_full_tbl (id) SELECT generate_series(1,100000);
INSERT 0 100000
testdb=# -- TRUNCATE test table
testdb=# CREATE TABLE truncate_tbl (id int);
CREATE TABLE
testdb=# INSERT INTO truncate_tbl (id) SELECT generate_series(1,100);
INSERT 0 100
Done full backup.
$ ./bin/pg_arman backup -B /usr/local/pgsql-pro/arman_backup -D /usr/local/pgsql-pro/data \
> -b full -v -j 2 -d postgres --stream
$ ./bin/pg_arman validate -B /usr/local/pgsql-pro/arman_backup -D /usr/local/pgsql-pro/data \
> -d postgres
Run basic operations for the respective tables:
testdb=# -- simple_tbl
testdb=# SELECT count(*), sum(id) FROM simple_tbl;
count | sum
-------+--------
1000 | 500500
(1 row)
testdb=# DELETE FROM simple_tbl WHERE id = 1;
DELETE 1
testdb=# SELECT count(*), sum(id) FROM simple_tbl;
count | sum
-------+--------
999 | 500499
(1 row)
testdb=# -- toast_tbl
testdb=# SELECT count(*), sum(id) FROM toast_tbl;
count | sum
-------+------
100 | 5050
(1 row)
testdb=# DELETE FROM toast_tbl WHERE id = 1;
DELETE 1
testdb=# SELECT count(*), sum(id) FROM toast_tbl;
count | sum
-------+------
99 | 5049
(1 row)
testdb=# -- vacuum_tbl
testdb=# SELECT count(*), sum(id) FROM vacuum_tbl;
count | sum
--------+------------
100000 | 5000050000
(1 row)
testdb=# DELETE FROM vacuum_tbl WHERE id % 50 != 1;
DELETE 98000
testdb=# VACUUM vacuum_tbl;
VACUUM
testdb=# SELECT count(*), sum(id) FROM vacuum_tbl;
count | sum
-------+----------
2000 | 99952000
(1 row)
testdb=# -- vacuum_full_tbl
testdb=# SELECT count(*), sum(id) FROM vacuum_full_tbl;
count | sum
--------+------------
100000 | 5000050000
(1 row)
testdb=# DELETE FROM vacuum_full_tbl WHERE id % 50 != 1;
DELETE 98000
testdb=# VACUUM FULL vacuum_full_tbl;
VACUUM
testdb=# SELECT count(*), sum(id) FROM vacuum_full_tbl;
count | sum
-------+----------
2000 | 99952000
(1 row)
testdb=# -- truncate_tbl
testdb=# TRUNCATE truncate_tbl;
TRUNCATE TABLE
testdb=# INSERT INTO truncate_tbl VALUES(1);
INSERT 0 1
testdb=# SELECT * FROM truncate_tbl;
id
----
1
(1 row)
Done incremental backup with ptrack option:
$ ./bin/pg_arman backup -B /usr/local/pgsql-pro/arman_backup -D /usr/local/pgsql-pro/data \
> -b ptrack -v -j 2 -d postgres --stream
$ ./bin/pg_arman validate -B /usr/local/pgsql-pro/arman_backup -D /usr/local/pgsql-pro/data \
> -d postgres
Stoped postgres server, deleted the database cluster, restored databese using pg_arman, and restarted the server:
$ ./bin/pg_ctl -D data stop
$ rm -rf data
$ ./bin/pg_arman restore -B /usr/local/pgsql-pro/arman_backup -D /usr/local/pgsql-pro/data \
> -j 4 -d postgres --verbose --stream
$ ./bin/pg_ctl -D data start
Checked tables:
testdb=# SELECT count(*), sum(id) FROM simple_tbl;
count | sum
-------+--------
999 | 500499
(1 row)
testdb=# SELECT count(*), sum(id) FROM toast_tbl;
count | sum
-------+------
99 | 5049
(1 row)
testdb=# SELECT count(*), sum(id) FROM vacuum_tbl;
count | sum
-------+----------
2000 | 99952000
(1 row)
testdb=# SELECT count(*), sum(id) FROM vacuum_full_tbl;
count | sum
-------+----------
2000 | 99952000
(1 row)
testdb=# SELECT count(*), sum(id) FROM truncate_tbl;
count | sum
-------+-----
1 | 1
(1 row)
OK! All tables have been restored perfectly!
Big table
Created a table to insert huge number of tuples, over 1 GB:
testdb=# CREATE TABLE big_tbl (id int, data text);
CREATE
testdb=# SELECT pg_relation_filepath('big_tbl');
pg_relation_filepath
----------------------
base/16384/16427
(1 row)
... insert huge number of tuples ...
testdb=# SELECT pg_size_pretty(pg_table_size('big_tbl'));
pg_size_pretty
----------------
1072 MB
(1 row)
The files related to the big_tbl
are shown below:
$ ls -la -h data/base/16384/16427*
-rw------- 1 vagrant vagrant 1.0G Jun 3 10:16 data/base/16384/16427
-rw------- 1 vagrant vagrant 45M Jun 3 10:20 data/base/16384/16427.1
-rw------- 1 vagrant vagrant 288K Jun 3 10:17 data/base/16384/16427_fsm
-rw------- 1 vagrant vagrant 24K Jun 3 10:18 data/base/16384/16427_ptrack
Done backup full, modified the table, executed incremental backup, and then the database restored from the backup.
Confirmed that PostgresPro’s pg_arman could be applicable for a big table.
Fault tolerance
Halted postgres server during a transaction running, in order to check the ability of fault tolerance.
After doing incremental backup, I started a transaction and inserted a tuple:
testdb=# BEGIN;
testdb=# INSERT INTO test VALUES(0);
At this time, I killed the postgres process:
postgres$ pkill -9 postgres
Then I restarted postgres and modified some tables, and executed incremental backup again.
After restoring the database, I confirmed that the database was perfectly restored.
Conclusion
According to my simple study, PostgresPro’s ptrack patch and pg_arman work well. Of course, this is my first impression. (I cannot do corner case test since I haven’t read the source code of pg_arman.)
Before this test, I had suspected that pg_arman has some obvious bugs, but it was my misunderstand. If I have time, I’ll read the pg_arman’s source and test it.
Now, I’m interested in the ptrack patch. This patch provides the function that records whether each block of all tables is modified, to respective fork file (_ptrack).
This patch defines a new fork referred to as _ptrack
.
2016.10.23: Fixed misunderstand.
In the current version of this patch (version 2),
whenever WAL record is written to the WAL segment file, associated fork file is updated.
whenever WAL record is written to the WAL segment file, associated fork file block is updated on the shared buffers.
More precisely, whenever Postgres writes WAL records, a function of this patch extracts the relfilenode and the block number of the WAL record, then other function of this patch changes the corresponding fork file of the relfilenode.
Clearly, it can be improved.
For example, just one idea, each ptrack fork block stores the LSN of the latest WAL record, and all ptrack fork files are written to the storage when executing checkpoint. This method can be avoided decreasing of performance, and if server crash occurs, ptrack fork files can be recovered by updating those files while server is recovered.