After IBD, PLE was suspected due to muscle loss in the thigh area and weight loss and started observation and follow-up
1.2
X-Ray
Abdominal sonography
ABGA
CBC(PROCYTE)
Chemistry Observation - Overall 11 items
Thyroid examination (T4+TSH)
SDMA
[Threatment]
1.
Gallbladder thickening observed
2.
Although the liver level is increased by about 2~3 times, it is judged that the level is relatively high compared to the inflammation observed on ultrasound
3.
A tumor with a certain shape was found in the gallbladder, but it is judged that it is too early to confirm whether it is myxoma.
In addition, follow-up observation of gallbladder at 3 months cycle is required
4.
Although it is not known whether it is the onset of cholecystitis caused by IBD, antibiotic treatment for cholecystitis and treatment to remove sludge are initiated.
It is estimated that the treatment will take about a month or more.
PostgreSQL version 10introduced several exciting features and improvements to the existing landscape of PostgreSQL. One such feature is logical replication using a publish and subscribe framework. AWS provides two managed PostgreSQL options:Amazon RDS for PostgreSQLandAmazon Aurora PostgreSQL. This post discusses how to use the existing framework to create a self-managed read replica from Amazon RDS for PostgreSQL or Aurora. You can apply a similar process to create a read replica onAmazon Elastic Compute Cloud(Amazon EC2)-based PostgreSQL, Amazon RDS for PostgreSQL, and Aurora PostgreSQL.
Amazon RDS for PostgreSQL supports the publication and subscription framework from engine version 10.4 and above. Aurora PostgreSQL supports the publication and subscription framework from engine version 2.2 (compatible with 10.6) and above.
Solution overview
A typical deployment of Amazon RDS for PostgreSQL and Aurora PostgreSQL databases involves application writes being served by a master instance and reads being offloaded to AWS managed replicas. However, certain business use cases also require you to set up an additional self-managed replica instance to serve other independent downstream applications that only need to operate on a subset of data (some databases or some tables only). Such partial replication allows individual replicas to handle independent parts of the workload in parallel, thereby increasing the system’s overall scalability.
With the introduction of logical replication using a publish and subscribe framework, therds_superuserrole can now set up a custom self-managed replication for PostgreSQL versions 10 and above. Because logical replication is a database-level solution and has possibilities of replicating some or all tables of a database, it’s still recommended to consider AWS-generated read replicas for Amazon RDS for PostgreSQL and Aurora PostgreSQL to serve the read traffic of the primary production application by elastically scaling out beyond the capacity constraints of a single master DB instance. For more information, seeWorking with PostgreSQL Read Replicas in Amazon RDSandReplication with Amazon Aurora PostgreSQL. We cover further considerations around the features of logical replication later in this post.
Logical replication uses a publish and subscribe model wherein subscribers pull data from the publications they subscribe to. Logical replication starts by copying a snapshot of initial existing data on the publisher database. When complete, the changes on the publisher (INSERT, UPDATE, and DELETE) are relayed to the subscriber in near-real time. The subscriber applies data in the same order in which commits were made on the publisher to ensure transactional consistency.
This is in contrast with physical replication, where exact block addresses are used for a byte-by-byte replication.
The following diagram represents the data flow for initial data copy and synchronization.
The process includes the following steps:
The publisher instance uses theCREATE PUBLICATIONcommand to specify a group of tables whose data changes are intended to be replicated.
The subscriber instance uses theCREATE SUBSCRIPTIONcommand to specify the name and connection details of the publication.
A successful execution of CREATE SUBSCRIPTION triggers a TCP connection to the publisher instance.
An incoming connection from the subscriber triggers the creation of a temporary logical replication slot at the publisher (under a logical decoding plugin calledpgoutput).
For context, a PostgreSQL instance keeps track of its transactions in a series of ordered 16 MB binary files called write-ahead logs (WAL). Starting with PostgreSQL v11, you can modify the WAL size duringinstance initialization. Amazon RDS for PostgreSQL v11 instances and Aurora engine version 3.0 (compatible with PostgreSQL 11.4) instances have a WAL size of 64 MB.
A replication slot allows a master instance to keep track of how far behind the standby is and prevent deletion of WAL files that the standby may still need. Such slots when used under context ofstreaming replicationare calledphysical replication slots. However, in case of logical replication, these slots are used under a decoding plugin (pgoutput, in this use case) that transforms the changes read from WAL tological replication protocoland filters the data as specified (according to the publication specification, in this use case). This decoding allows extraction of all persistent changes into a coherent, easy-to-understand format that can be interpreted without detailed knowledge of the database’s internal state. Such slots, when used in context of a logical replication, are calledlogical replication slots.
By default, the temporary logical slot is created with nomenclature{sub name}_{sub oid}_sync_{reloid}, where{sub name}is the subscription name specified with CREATE SUBSCRIPTION. This behavior can be modified using theslot_nameoption of the command.
The initial data in existing subscribed tables is used to create a snapshot and is transferred to the subscriber via theCOPYcommand.
The initial sync worker at the subscriber receives the snapshot, maps the payload within snapshot to local tables, and applies the required operations.
The following diagram illustrates the flow for transactional data (after the initial snapshot is copied).
The process includes the following steps:
After the initial synchronization is complete, a permanent slot is created (by default, with the same name as the subscription) via a logical decoding plugin calledpgoutput. This slot continues to exist along with the lifespan of related subscription.
Thewalsenderprocess starts extraction of all persistent changes from the received WALs (called logical decoding).
The plugin transforms the changes retrieved from the WAL into the logical replication protocol and filters the data according to the publication specification.
The data is then transferred to theapply worker, which further maps the payload to local tables and applies the individual changes.
Some common considerations when using logical replication include the following:
Each publication exists in only one database.
As of this writing, publications contain only tables. You can’t replicate the following:
Views, materialized views, partition root tables, or foreign tables.
Large objects. Thebyteadata type is supported and can be used as a workaround.
Sequences. Serial or identity columns backed by sequences are replicated as part of the table.
You can add tables to multiple publications, if needed.
Publications can have multiple subscribers.
Each subscription receives changes via one replication slot and additional temporary replication slots created for the initial data synchronization of pre-existing tables.
The schema definitions at the publisher aren’t replicated to the subscriber. You have to manually create the object schema at the subscriber to initiate the replication for that object. For example, a table created at the publisher (after replication has started) requires the following actions at the subscriber:
Create a similar schema
Modify the subscription (usingALTER SUBSCRIPTION). We illustrate this process later in the post.
Tables are matched between the publisher and subscriber via their designated names.
The order of columns in the subscriber table doesn’t need to match that of publisher.
The data types of the columns between the publisher’s and subscriber’s objects don’t need to match as long as you can convert the text representation of the data at the publisher to the target data type at the subscriber. For example, publishing anintcolumn works with a subscriber using abigintcolumn, but an error occurs if anintis attempted to be published into atimestampcolumn. We illustrate this behavior later in the post.
The subscriber table can also have additional columns not provided by the published table. Such columns are populated with default values as specified in the DDL of subscriber table.
Setting up
For this post, we use Amazon RDS for PostgreSQL (version 10.4) as the publisher instance and a self-managed PostgreSQL instance (version 10.6) as the subscriber. The publisher is intended to replicate all tables of the databasepostgresto the self-managed subscriber.
You can use the subscriber database as a publisher for other downstream databases by defining its own publications (cascading replication configuration). However, for the sake of simplicity, this post assumes that the application treats the subscriber (self-managed PostgreSQL database) as read-only.
For visual clarity, SQL statements involve the following psql prompts (customized viapsqlrc file) for the publisher and subscriber:
rds-master=>– Points to publisher instance
self-managed-replica=#– Points to subscriber instance
The following code lists the connection and permission metadata:
rds-master=> \conninfo
You are connected to database "postgres" as user "nikhil" on host "pg-10-ident.cxxxxxxbz.us-east-1.rds.amazonaws.com" at port "5432".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
rds-master=> \du nikhil
List of roles
Role name | Attributes | Member of
-----------+-------------------------------+------------------
nikhil | Create role, Create DB +| {rds_superuser}
| Password valid until infinity |
self-managed-replica=# \conninfo
You are connected to database "postgres" as user "khokhar" via socket in "/tmp" at port "5432".
self-managed-replica=# \du khokhar
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+------------
khokhar | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Creating objects at the publisher before setting up replication
To create your objects at the publisher before setting up replication, complete the following steps:
Create a table namedtable_before_publicationin the publisher instance and populate some values. See the following code:
Successfully creating the subscription at the self-managed subscriber instance triggers a slot creation at the publisher. Ifrds.force_sslparameter is set to 1 (on) at the publisher instance then you need to update theCONNECTIONstring ofCREATE SUBSCRIPTIONcommand and includeSSL parameterslikesslmode,sslrootcertetc. For information about downloading certificates for RDS PostgreSQL, seeUsing SSL/TLS to Encrypt a Connection to a DB Instance.
For this use case, because a pre-existing table was in the publisher, an additional temporary slot is also created. See the following code:
In the preceding code,placeholder_col1andplaceholder_col2are populated with empty strings andid2receives an integer. To list the status at publisher, see the following code:To list the status at the self-managed subscriber, see the following code:The preceding code uses the following values:
srsubstate represents the state of replication:
i– Initialize,
d– Data is being copied,
s– Synchronized,
r– Feady (normal replication)
srsublsn represents the end LSN for s and r states
Creating objects at the publisher after setting up the replication
To create your objects at the publisher after setting up replication, see the following code:
rds-master=> create table table_after_publication (id1 int, id2 int);
CREATE TABLE
rds-master=> insert into table_after_publication values (1,2);
INSERT 0 1
rds-master=> select * from pg_publication_tables;
pubname | schemaname | tablename
-----------+------------+-----------
alltables | public | table_before_publication
alltables | public | table_after_publication
(2 rows)
self-managed-replica=# select * from table_after_publication;
ERROR: relation "table_after_publication" does not exist
LINE 1: select * from table_after_publication;
^
self-managed-replica=# select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
srsubid | pg_filenode_relation | srsublsn | srsubstate
---------+--------------------------+------------+------------
16391 | table_before_publication | 57/80197D0 | r
self-managed-replica=# alter subscription mysub REFRESH PUBLICATION;
ERROR: relation "public.table_after_publication" does not exist
Schema must exist at the self-managed subscriber for the replication to work. Once created, refresh the metadata usingalter subscription. See the following code:
Data type conversion issues between object schemas
We successfully achieved replication of the publisher’s object (withintcolumn) to the subscriber’s object (withbigintcolumn). To further illustrate the behavior with incompatible conversions, create a table namedschema_conversion_testwithintdata type in the publisher instance. See the following code:
Refresh the subscription and validate the data. See the following code:
self-managed-replica=# alter subscription mysub REFRESH PUBLICATION;
ALTER SUBSCRIPTION
self-managed-replica=# select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
srsubid | pg_filenode_relation | srsublsn | srsubstate
---------+--------------------------+-------------+------------
16391 | table_after_publication | 57/80197D0 | r
16391 | table_before_publication | 57/8019818 | r
16391 | schema_conversion_test | | d
(3 rows)
self-managed-replica=# select * from schema_conversion_test;
id
----
(0 rows)
In the preceding code, the subscription state (srsubstate) remains atd(data is being copied); however, no new rows were found in the subscriber’s object. A quick review of the error logs generated by the subscriber outlines the actual state of replication. The logical replication worker attempts the initial synchronization via COPY but fails on the data type conversion. As the worker continues to retry, the following excerpt is continuously reproduced in the error logs accordingly:
2020-xx-xx 08:xx:06 UTC::@:[2067]:LOG: logical replication table synchronization worker for subscription "mysub", table "schema_conversion_test" has started
2020-xx-xx 08:xx:06 UTC::@:[2067]:ERROR: invalid input syntax for type path: "1"
2020-xx-xx 08:xx:06 UTC::@:[2067]:CONTEXT: COPY schema_conversion_test, line 1, column col1: "1"
2020-xx-xx 08:xx:06 UTC::@:[5334]:LOG: background worker "logical replication worker" (PID 2067) exited with exit code 1
Updates and deletes on the publisher
The metadata of publication suggests UPDATE and DELETE operations are replicated. See the following code:
rds-master=> select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete
-----------+----------+--------------+-----------+-----------+-----------
alltables | 16389 | t | t | t | t
(1 row)
However, an UPDATE operation fails if the table schema doesn’t involve a replica identity. See the following code:
rds-master=> update table_before_publication set id1=1111 where id1=1;
ERROR: cannot update table "table_before_publication" because it does not have a replica identity and publishes updates
HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
A published table must have a replica identity to replicate UPDATE and DELETE operations. This configuration modifies the information written to the transaction logs (WAL) and helps in identifying the rows involved. Possible options are the following:
DEFAULT– Records the old values of the columns of the primary key (if any)
USING INDEX– Records the old values of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL
FULL– Records the old values of all columns in the row
NOTHING– Records no information about the old row
If any option other than FULL is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber side.
For the sake of simplicity, the replica identity is set to FULL in this post. A FULL configuration implies that the entire row of the table becomes the key and the entire row of old value is stored. This approach can become inefficient from a storage point of view (compared to other options) and should only be used as a last resort for objects that can’t have a primary key or index. For example, you can use a replica identity as FULL with audit and history tables, which typically don’t have an associated primary key or index. See the following code:
rds-master=> alter table table_before_publication REPLICA IDENTITY FULL;
ALTER TABLE
rds-master=> update table_before_publication set id1=1111 where id1=1;
UPDATE 1
rds-master=> select * from table_before_publication;
id1 | id2
------+-----
2 | 3
1111 | 2
(2 rows)
self-managed-replica=# select * from table_before_publication;
placeholder_col1 | id1 | placeholder_col2 | id2
------------------+------+------------------+-----
| 2 | | 3
| 1111 | | 2
(2 rows)
Cleaning up
After completing this walkthrough, you should clean up your resources. See the following code:
self-managed-replica=# drop subscription mysub;
NOTICE: dropped replication slot "mysub" on publisher
DROP SUBSCRIPTION
rds-master=> select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)
rds-master=> select * from pg_publication_tables;
pubname | schemaname | tablename
-----------+------------+-----------
alltables | public | table_before_publication
alltables | public | table_after_publication
alltables | public | schema_conversion_test
(3 rows)
rds-master=> select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete
-----------+----------+--------------+-----------+-----------+-----------
alltables | 16389 | t | t | t | t
(1 row)
rds-master=> drop publication alltables;
DROP PUBLICATION
rds-master=> select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete
---------+----------+--------------+-----------+-----------+-----------
(0 rows)
rds-master=> select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+-----------
(0 rows)
Summary
This post illustrated the need of parallelizing independent parts of a workload and the ways to achieve it using partial data replication. You can use this approach to achieve low-downtime upgrades, break databases into shards, make online schema changes, and more. We encourage you to use the logical replication framework of community PostgreSQL in your Amazon RDS for PostgreSQL and Aurora PostgreSQL environments to set up partial replication of data without managing intermediary resources. As always, AWS welcomes feedback, so please leave your thoughts or questions in the comments.