PostgreSQL Edge Cache

4 minute read

pgec is a real-time in memory database replication cache, with a memcached and REST API. It supports column lists and row filters with the latest features of replication in PostgreSQL 15.

High Level Architecture

Using a local postgres via docker to demonstrate:

docker run \
    --rm \
    --pull always \
    --name postgres \
    --detach \
    --publish 5432:5432 \
    --env POSTGRES_PASSWORD=postgres \
    postgres:15 \
    -c wal_level=logical

It is important that PostgreSQL is running using logical replication enabled (the wal_level=logical) which is used by pgec to remain synchronised with your data as it changes.

An interactive SQL shell so that we can create some data:

docker exec \
    --interactive \
    --tty \
    postgres \
    psql \
    postgres \
    postgres

PostgreSQL 15 New Logical Replication Features

The new logical replication features in the recent PostgreSQL 15 release, are: row filters and column lists.

Column Lists

From PostgreSQL 15, each publication can specify which columns of each table are replicated to subscribers.

Create a table t1 to be used in the following example:

create table t1 (id int,
                 a text,
                 b text,
                 c text,
                 d text,
                 e text,
                 primary key(id));

Create a publication p1, with a column list to reduce the number of columns that will be replicated:

create publication p1
  for table t1 (id, a, b, d);

Insert some test data into the newly created t1 table:

insert into t1
   values
     (1, 'a-1', 'b-1', 'c-1', 'd-1', 'e-1'),
     (2, 'a-2', 'b-2', 'c-2', 'd-2', 'e-2'),
     (3, 'a-3', 'b-3', 'c-3', 'd-3', 'e-3');

Row Filters

By default, all data from all published tables will be replicated to the appropriate subscribers. The replicated data can be reduced by using a row filter. A user might choose to use row filters for behavioural, security or performance reasons. If a published table sets a row filter, a row is replicated only if its data satisfies the row filter expression. This allows a set of tables to be partially replicated. The row filter is defined per table.

Create a table t2 to be used in the following example:

create table t2 (a int,
                 b int,
                 c text,
                 primary key(a));

Creating a publication p2 with a row filter:

create publication p2
  for table t2
  where (a > 5);

With some initial data:

insert into t2
  values
    (2, 102, 'NSW'),
    (3, 103, 'QLD'),
    (4, 104, 'VIC'),
    (5, 105, 'ACT'),
    (6, 106, 'NSW'),
    (7, 107, 'NT'),
    (8, 108, 'QLD'),
    (9, 109, 'NSW');

In Memory Database Replication Cache

pgec is a real-time in memory database replication cache, with a memcached and REST API.

Start pgec connecting to our database, replicating the two publications that we have just created:

docker run \
    --detach \
    --name pgec \
    --pull always \
    --publish 8080:80 \
    --publish 11211:11211 \
    -e PGMP_REPLICATION_LOGICAL_PUBLICATION_NAMES=p1,p2 \
    -e PGMP_DATABASE_USER=postgres \
    -e PGMP_DATABASE_PASSWORD=postgres \
    -e PGMP_DATABASE_HOSTNAME=host.docker.internal \
    ghcr.io/shortishly/pgec:develop

pgec will act as a real-time in memory database replication cache, with a memcached and REST API, for the publications we have just created.

The replication process creates a transaction checkpoint ensuring data integrity. Once the initial data has been collected, streaming replication starts, receiving changes that have been applied subsequent to the checkpoint, ensuring no loss of data. Real-time streaming replication continues keeping pgec up to date.

memcached

We can make memcached requests to get data from pgec on port 11211.

The keys used have the format: publication.table.key. To get the key “1” from table “t1” and publication: “p1”. We would use get p1.t1.1 as follows:

telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get p1.t1.1
VALUE p1.t1.1 0 38
{"a": "a-1", "b": "b-1", "d": "d-1", "id": 1}
END
get p2.t2.6
VALUE p2.t2.6 0 25
{"a": 6, "b": 106, "c": "NSW"}
END

REST API

Taking a look at the t1 table via the JSON API:

curl http://localhost:8080/p1/t1
{"rows": [{"a": "a-1", "b": "b-1", "d": "d-1", "id": 1},
          {"a": "a-2", "b": "b-2", "d": "d-2", "id": 2},
          {"a": "a-3", "b": "b-3", "d": "d-3", "id": 3}]}

Note that columns “c” and “e” are not included in either the memcached or REST response because they are not part of publication p1.

Retrieve an individual row by simply supplying the key:

curl http://localhost:8080/p1/t1/3
{"a": "a-3", "b": "b-3", "d": "d-3", "id": 3}

Real-time Streaming Replication

Data changes are reflected in pgec through real-time streaming replication.

Updated data is streamed into pgec in real-time. Row filters are applied, adding or removing rows where necessary.

The existing row (3, 103, 'QLD') in t2 does not match the row filter, because the primary key is not greater than 5. Lets update it so that it is included in replication.

Before the change:

telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get p2.t2.11
END

Apply the change:

update t2 set a = 11 where a = 3;

Checking the result:

telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get p2.t2.11
VALUE p2.t2.11 0 26
{"a":11,"b":103,"c":"QLD"}
END