728x90

PostgreSQL version 10 introduced several exciting features and improvements to the existing landscape of PostgreSQL. One such feature is logical replication using a publish and subscribe framework. AWS provides two managed PostgreSQL options: Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. This post discusses how to use the existing framework to create a self-managed read replica from Amazon RDS for PostgreSQL or Aurora. You can apply a similar process to create a read replica on Amazon Elastic Compute Cloud (Amazon EC2)-based PostgreSQL, Amazon RDS for PostgreSQL, and Aurora PostgreSQL.

Amazon RDS for PostgreSQL supports the publication and subscription framework from engine version 10.4 and above. Aurora PostgreSQL supports the publication and subscription framework from engine version 2.2 (compatible with 10.6) and above.

Solution overview

A typical deployment of Amazon RDS for PostgreSQL and Aurora PostgreSQL databases involves application writes being served by a master instance and reads being offloaded to AWS managed replicas. However, certain business use cases also require you to set up an additional self-managed replica instance to serve other independent downstream applications that only need to operate on a subset of data (some databases or some tables only). Such partial replication allows individual replicas to handle independent parts of the workload in parallel, thereby increasing the system’s overall scalability.

With the introduction of logical replication using a publish and subscribe framework, the rds_superuser role can now set up a custom self-managed replication for PostgreSQL versions 10 and above. Because logical replication is a database-level solution and has possibilities of replicating some or all tables of a database, it’s still recommended to consider AWS-generated read replicas for Amazon RDS for PostgreSQL and Aurora PostgreSQL to serve the read traffic of the primary production application by elastically scaling out beyond the capacity constraints of a single master DB instance. For more information, see Working with PostgreSQL Read Replicas in Amazon RDS and Replication with Amazon Aurora PostgreSQL. We cover further considerations around the features of logical replication later in this post.

Logical replication uses a publish and subscribe model wherein subscribers pull data from the publications they subscribe to. Logical replication starts by copying a snapshot of initial existing data on the publisher database. When complete, the changes on the publisher (INSERT, UPDATE, and DELETE) are relayed to the subscriber in near-real time. The subscriber applies data in the same order in which commits were made on the publisher to ensure transactional consistency.

This is in contrast with physical replication, where exact block addresses are used for a byte-by-byte replication.

The following diagram represents the data flow for initial data copy and synchronization.

The process includes the following steps:

  1. The publisher instance uses the CREATE PUBLICATION command to specify a group of tables whose data changes are intended to be replicated.
  2. The subscriber instance uses the CREATE SUBSCRIPTION command to specify the name and connection details of the publication.
  3. A successful execution of CREATE SUBSCRIPTION triggers a TCP connection to the publisher instance.
  4. An incoming connection from the subscriber triggers the creation of a temporary logical replication slot at the publisher (under a logical decoding plugin called pgoutput).
    • For context, a PostgreSQL instance keeps track of its transactions in a series of ordered 16 MB binary files called write-ahead logs (WAL). Starting with PostgreSQL v11, you can modify the WAL size during instance initialization. Amazon RDS for PostgreSQL v11 instances and Aurora engine version 3.0 (compatible with PostgreSQL 11.4) instances have a WAL size of 64 MB.
    • A replication slot allows a master instance to keep track of how far behind the standby is and prevent deletion of WAL files that the standby may still need. Such slots when used under context of streaming replication are called physical replication slots. However, in case of logical replication, these slots are used under a decoding plugin (pgoutput, in this use case) that transforms the changes read from WAL to logical replication protocol and filters the data as specified (according to the publication specification, in this use case). This decoding allows extraction of all persistent changes into a coherent, easy-to-understand format that can be interpreted without detailed knowledge of the database’s internal state. Such slots, when used in context of a logical replication, are called logical replication slots.
    • By default, the temporary logical slot is created with nomenclature {sub name}_{sub oid}_sync_{reloid}, where {sub name} is the subscription name specified with CREATE SUBSCRIPTION. This behavior can be modified using the slot_name option of the command.
  5. The initial data in existing subscribed tables is used to create a snapshot and is transferred to the subscriber via the COPY command.
  6. The initial sync worker at the subscriber receives the snapshot, maps the payload within snapshot to local tables, and applies the required operations.

The following diagram illustrates the flow for transactional data (after the initial snapshot is copied).

The process includes the following steps:

  1. After the initial synchronization is complete, a permanent slot is created (by default, with the same name as the subscription) via a logical decoding plugin called pgoutput. This slot continues to exist along with the lifespan of related subscription.
  2. The walsender process starts extraction of all persistent changes from the received WALs (called logical decoding).
  3. The plugin transforms the changes retrieved from the WAL into the logical replication protocol and filters the data according to the publication specification.
  4. The data is then transferred to the apply worker, which further maps the payload to local tables and applies the individual changes.

Prerequisites

Before implementing this solution, you must set up logical replication. For instructions for Amazon RDS for PostgreSQL, see Logical Replication for PostgreSQL on Amazon RDS. For instructions for Aurora PostgreSQL, see Configuring Logical Replication.

Considerations with logical replication

Some common considerations when using logical replication include the following:

  • Each publication exists in only one database.
  • As of this writing, publications contain only tables. You can’t replicate the following:
    • Views, materialized views, partition root tables, or foreign tables.
    • Large objects. The bytea data type is supported and can be used as a workaround.
    • Sequences. Serial or identity columns backed by sequences are replicated as part of the table.
  • You can add tables to multiple publications, if needed.
  • Publications can have multiple subscribers.
  • Each subscription receives changes via one replication slot and additional temporary replication slots created for the initial data synchronization of pre-existing tables.
  • The schema definitions at the publisher aren’t replicated to the subscriber. You have to manually create the object schema at the subscriber to initiate the replication for that object. For example, a table created at the publisher (after replication has started) requires the following actions at the subscriber:
    • Create a similar schema
    • Modify the subscription (using ALTER SUBSCRIPTION). We illustrate this process later in the post.
  • Tables are matched between the publisher and subscriber via their designated names.
    • The order of columns in the subscriber table doesn’t need to match that of publisher.
    • The data types of the columns between the publisher’s and subscriber’s objects don’t need to match as long as you can convert the text representation of the data at the publisher to the target data type at the subscriber. For example, publishing an int column works with a subscriber using a bigint column, but an error occurs if an int is attempted to be published into a timestamp column. We illustrate this behavior later in the post.
    • The subscriber table can also have additional columns not provided by the published table. Such columns are populated with default values as specified in the DDL of subscriber table.

Setting up

For this post, we use Amazon RDS for PostgreSQL (version 10.4) as the publisher instance and a self-managed PostgreSQL instance (version 10.6) as the subscriber. The publisher is intended to replicate all tables of the database postgres to the self-managed subscriber.

You can use the subscriber database as a publisher for other downstream databases by defining its own publications (cascading replication configuration). However, for the sake of simplicity, this post assumes that the application treats the subscriber (self-managed PostgreSQL database) as read-only.

For visual clarity, SQL statements involve the following psql prompts (customized via psqlrc file) for the publisher and subscriber:

  • rds-master=> – Points to publisher instance
  • self-managed-replica=# – Points to subscriber instance

The following code lists the connection and permission metadata:

rds-master=> \conninfo
You are connected to database "postgres" as user "nikhil" on host "pg-10-ident.cxxxxxxbz.us-east-1.rds.amazonaws.com" at port "5432".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
 
rds-master=> \du nikhil
                             List of roles
 Role name |          Attributes           |    Member of    
-----------+-------------------------------+------------------
 nikhil    | Create role, Create DB       +| {rds_superuser}  
           | Password valid until infinity |                  
  
self-managed-replica=# \conninfo
You are connected to database "postgres" as user "khokhar" via socket in "/tmp" at port "5432".
  
self-managed-replica=# \du khokhar

							List of roles
 Role name |                         Attributes                         | Member of  
-----------+------------------------------------------------------------+------------
 khokhar   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}         

Creating objects at the publisher before setting up replication

To create your objects at the publisher before setting up replication, complete the following steps:

  1. Create a table named table_before_publication in the publisher instance and populate some values. See the following code:
    rds-master=> create table table_before_publication (id1 int, id2 int);
     	CREATE TABLE
     
    rds-master=> insert into table_before_publication   values (1,2);
    INSERT 0 1
     
    rds-master=> select * from pg_replication_slots;
    slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
    -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
    (0 rows)
    There are no existing replication slots in the publisher instance at this point.
  2. Create a publication for all tables in the database named postgres. See the following code:
    rds-master=> CREATE PUBLICATION alltables FOR ALL TABLES;
    CREATE PUBLICATION
     
    rds-master=> select * from pg_replication_slots;
     slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
    -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
    (0 rows)
    As evident from the preceding code, creating a publication instance doesn’t necessarily imply creation of replication slots.
  3. Create a subscription from the self-managed See the following code:
    self-managed-replica=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=postgres host=pg-10-ident.cxxxxxbz.us-east-1.rds.amazonaws.com user=nikhil password=pxxxxxd' PUBLICATION alltables;
    ERROR:  relation "public.table_before_publication" does not exist
    You need to have a schema at the subscriber for the initial snapshot process to complete.
  4. Create a table with matching table name and not necessarily a complete matching schema. See the following code:
    self-managed-replica=# create table table_before_publication(placeholder_col1 char(2), id1 int, placeholder_col2 char(2), id2 bigint);
    CREATE TABLE
     
    self-managed-replica=# CREATE SUBSCRIPTION mysub CONNECTION 'dbname=postgres host=pg-10-ident.cxxxxxz.us-east-1.rds.amazonaws.com user=nikhil password=pxxxxxd' PUBLICATION alltables;
    NOTICE:  created replication slot "mysub" on publisher
    CREATE SUBSCRIPTION
    Successfully creating the subscription at the self-managed subscriber instance triggers a slot creation at the publisher. If rds.force_ssl parameter is set to 1 (on) at the publisher instance then you need to update the CONNECTION string of CREATE SUBSCRIPTION command and include SSL parameters like sslmode, sslrootcert etc. For information about downloading certificates for RDS PostgreSQL, see Using SSL/TLS to Encrypt a Connection to a DB Instance.
  5. For this use case, because a pre-existing table was in the publisher, an additional temporary slot is also created. See the following code:
    rds-master=> select * from pg_replication_slots;
           slot_name        |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
    ------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
     mysub_16391_sync_16422 | pgoutput | logical   |  13934 | postgres | t         | t      |      31808 |      |          661 | 0/12016490  | 0/120164C8
     mysub                  | pgoutput | logical   |  13934 | postgres | f         | t      |      31772 |      |          661 | 0/12016490  | 0/120164C8
    (2 rows)
     
    rds-master=> select * from pg_replication_slots;
     slot_name |  plugin  | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
    -----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
     mysub     | pgoutput | logical   |  13934 | postgres | f         | t      |      31772 |      |          661 | 0/12016490  | 0/120164C8
    (1 row)
  6. Verify data consistency between the publisher and self-managed Both the initial snapshot and new changes should be replicated. See the following code:
    rds-master=> select * from table_before_publication;
     id1 | id2 
    -----+-----
       1 |   2
    (1 row)
     
    self-managed-replica=# select * from table_before_publication;
     placeholder_col1 | id1 | placeholder_col2 | id2 
    ------------------+-----+------------------+-----
                      |   1 |                  |   2
    (1 row)
      
    rds-master=> insert into table_before_publication values (2,3);
    INSERT 0 1
     
    self-managed-replica=# select * from table_before_publication;
     placeholder_col1 | id1 | placeholder_col2 | id2 
    ------------------+-----+------------------+-----
                      |   1 |                  |   2
                      |   2 |                  |   3
    (2 rows)
    In the preceding code, placeholder_col1 and placeholder_col2 are populated with empty strings and id2 receives an integer. To list the status at publisher, see the following code:To list the status at the self-managed subscriber, see the following code:The preceding code uses the following values:
    • srsubstate represents the state of replication:
      • i – Initialize,
      • d – Data is being copied,
      • s – Synchronized,
      • r – Feady (normal replication)
    • srsublsn represents the end LSN for s and r states
  7. self-managed-replica=# select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
    
     srsubid |   pg_filenode_relation   |  srsublsn  | srsubstate 
    ---------+--------------------------+------------+------------
       16391 | table_before_publication | 57/80197D0 | r
      (1 rows)
  8. rds-master=> select * from pg_stat_replication;
    -[ RECORD 1 ]----+-----------------------------
    pid              | 4218
    usesysid         | 16389
    usename          | nikhil
    application_name | mysub
    client_addr      | xxx.31.89.29
    client_hostname  | 
    client_port      | 57078
    backend_start    | 2020-xx-xx 07:xx:49.72425+00
    backend_xmin     | 
    state            | streaming
    sent_lsn         | 57/14000060
    write_lsn        | 57/14000060
    flush_lsn        | 57/14000060
    replay_lsn       | 57/14000060
    write_lag        | 
    flush_lag        | 
    replay_lag       | 
    sync_priority    | 0
    sync_state       | async

Creating objects at the publisher after setting up the replication

To create your objects at the publisher after setting up replication, see the following code:

rds-master=> create table table_after_publication (id1 int, id2 int);
CREATE TABLE
 
rds-master=> insert into table_after_publication values (1,2);
INSERT 0 1
 
rds-master=> select * from pg_publication_tables;
  pubname  | schemaname | tablename 
-----------+------------+-----------
 alltables | public     | table_before_publication
 alltables | public     | table_after_publication
(2 rows)
  
self-managed-replica=# select * from table_after_publication;
ERROR:  relation "table_after_publication" does not exist
LINE 1: select * from table_after_publication;
                                                      ^
self-managed-replica=# select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
 srsubid |   pg_filenode_relation   |  srsublsn  | srsubstate 
---------+--------------------------+------------+------------
   16391 | table_before_publication | 57/80197D0 | r
 
self-managed-replica=# alter subscription mysub REFRESH PUBLICATION;
ERROR:  relation "public.table_after_publication" does not exist

Schema must exist at the self-managed subscriber for the replication to work. Once created, refresh the metadata using alter subscription. See the following code:

self-managed-replica=# create table table_after_publication (placeholder_col1 char(2), id1 int, placeholder_col2 char(2), id2 bigint);
CREATE TABLE
 
self-managed-replica=# alter subscription mysub REFRESH PUBLICATION;
ALTER SUBSCRIPTION
 
self-managed-replica=# select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
 srsubid |   pg_filenode_relation   |  srsublsn  | srsubstate 
---------+--------------------------+------------+------------
   16391 | table_before_publication | 57/80197D0 | r
   16391 | table_after_publication  |            | d
(2 rows)
 
self-managed-replica=# select * from table_after_publication;
placeholder_col1 | id1 | placeholder _col2 | id2 
-----------------+-----+-------------------+-----
                 |   1 |                   |   2
(1 row)

Data type conversion issues between object schemas

We successfully achieved replication of the publisher’s object (with int column) to the subscriber’s object (with bigint column). To further illustrate the behavior with incompatible conversions, create a table named schema_conversion_test with int data type in the publisher instance. See the following code:

rds-master=> create table schema_conversion_test (col1 int);
CREATE TABLE

rds-master=> insert into schema_conversion_test values (1),(2);
INSERT 0 2

Create a table named schema_conversion_test with timestamp data type in the subscriber instance. See the following code:

self-managed-replica=# create table schema_conversion_test (col1 timestamp);
CREATE TABLE

Refresh the subscription and validate the data. See the following code:

self-managed-replica=# alter subscription mysub REFRESH PUBLICATION;
ALTER SUBSCRIPTION

self-managed-replica=#  select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;

 srsubid |   pg_filenode_relation   |  srsublsn   | srsubstate 
---------+--------------------------+-------------+------------
   16391 | table_after_publication  | 57/80197D0  | r
   16391 | table_before_publication | 57/8019818  | r
   16391 | schema_conversion_test   |             | d
(3 rows)

self-managed-replica=# select * from schema_conversion_test;
 id 
----
(0 rows)

In the preceding code, the subscription state (srsubstate) remains at d (data is being copied); however, no new rows were found in the subscriber’s object. A quick review of the error logs generated by the subscriber outlines the actual state of replication. The logical replication worker attempts the initial synchronization via COPY but fails on the data type conversion. As the worker continues to retry, the following excerpt is continuously reproduced in the error logs accordingly:

2020-xx-xx 08:xx:06 UTC::@:[2067]:LOG: logical replication table synchronization worker for subscription "mysub", table "schema_conversion_test" has started
2020-xx-xx 08:xx:06 UTC::@:[2067]:ERROR: invalid input syntax for type path: "1"
2020-xx-xx 08:xx:06 UTC::@:[2067]:CONTEXT: COPY schema_conversion_test, line 1, column col1: "1"
2020-xx-xx 08:xx:06 UTC::@:[5334]:LOG: background worker "logical replication worker" (PID 2067) exited with exit code 1

Updates and deletes on the publisher

The metadata of publication suggests UPDATE and DELETE operations are replicated. See the following code:

rds-master=> select * from pg_publication;
  pubname  | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
-----------+----------+--------------+-----------+-----------+-----------
 alltables |    16389 | t            | t         | t         | t
(1 row)

However, an UPDATE operation fails if the table schema doesn’t involve a replica identity. See the following code:

rds-master=> update table_before_publication set id1=1111 where id1=1;
ERROR:  cannot update table "table_before_publication" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

A published table must have a replica identity to replicate UPDATE and DELETE operations. This configuration modifies the information written to the transaction logs (WAL) and helps in identifying the rows involved. Possible options are the following:

  • DEFAULT – Records the old values of the columns of the primary key (if any)
  • USING INDEX – Records the old values of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL
  • FULL – Records the old values of all columns in the row
  • NOTHING – Records no information about the old row

If any option other than FULL is set on the publisher side, a replica identity comprising the same or fewer columns must also be set on the subscriber side.

For the sake of simplicity, the replica identity is set to FULL in this post. A FULL configuration implies that the entire row of the table becomes the key and the entire row of old value is stored. This approach can become inefficient from a storage point of view (compared to other options) and should only be used as a last resort for objects that can’t have a primary key or index. For example, you can use a replica identity as FULL with audit and history tables, which typically don’t have an associated primary key or index. See the following code:

rds-master=> alter table table_before_publication REPLICA IDENTITY FULL;
ALTER TABLE
 
rds-master=> update table_before_publication set id1=1111 where id1=1;
UPDATE 1
 
rds-master=> select * from table_before_publication;
 id1  | id2 
------+-----
    2 |   3
 1111 |   2
(2 rows)

self-managed-replica=# select * from table_before_publication;
 placeholder_col1 | id1  | placeholder_col2 | id2 
------------------+------+------------------+-----
                  |    2 |                  |   3
                  | 1111 |                  |   2
(2 rows)

Cleaning up

 After completing this walkthrough, you should clean up your resources. See the following code:

self-managed-replica=# drop subscription mysub;
NOTICE:  dropped replication slot "mysub" on publisher
DROP SUBSCRIPTION
 
rds-master=> select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)
 
rds-master=> select * from pg_publication_tables;
  pubname  | schemaname | tablename 
-----------+------------+-----------
 alltables | public     | table_before_publication
 alltables | public     | table_after_publication
 alltables | public     | schema_conversion_test   
(3 rows)
  
rds-master=> select * from pg_publication;
  pubname  | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
-----------+----------+--------------+-----------+-----------+-----------
 alltables |    16389 | t            | t         | t         | t
(1 row)
  
rds-master=>         drop publication alltables;
DROP PUBLICATION
 
rds-master=>         select * from pg_publication;
 pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete 
---------+----------+--------------+-----------+-----------+-----------
(0 rows)
 
 rds-master=>         select * from pg_publication_tables;
 pubname | schemaname | tablename 
---------+------------+-----------
(0 rows)

Summary

This post illustrated the need of parallelizing independent parts of a workload and the ways to achieve it using partial data replication. You can use this approach to achieve low-downtime upgrades, break databases into shards, make online schema changes, and more. We encourage you to use the logical replication framework of community PostgreSQL in your Amazon RDS for PostgreSQL and Aurora PostgreSQL environments to set up partial replication of data without managing intermediary resources. As always, AWS welcomes feedback, so please leave your thoughts or questions in the comments.

728x90

데이터를 좀 뽑을 일이 있어서 쿼리를 짜는 와중에, 자연스럽게  ROW_NUMBER 함수를 썻다가 Syntax Error 가 나는 것을 보고 ?를 띄우며 당황스러움을 감추지 못했다.

 

SELECT t.*
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY week_num, ranking_type ORDER BY score DESC) AS rn
  FROM (
    SELECT *,
           YEAR(created_at) AS year_num,
           WEEK(created_at) AS week_num
    FROM your_table_name
    WHERE YEAR(created_at) = 2022
  ) AS subquery
) AS t
WHERE t.rn <= 3
ORDER BY t.week_num, t.ranking_type, t.score DESC;

 

그래서 이런 ROW_NUMBER 함수는 변환해서 써줘야 하는데, 

이런 경우에는 JOIN 이 매우매우 좋다.

 

따라서  JOIN 으로 뿌셔버리고 해당 부분에 대한 문제를 해결해줫다.

 


물론 번외로 MySQL 8.0 에서는 지원이 된다고 하니 버전업을 읏샤읏샤 해버리고 싶다만은.. 그게 그렇게 간단한 일이 아니잖아?

'DB' 카테고리의 다른 글

VARCHAR vs TEXT  (1) 2024.09.10
[MySQL] MySQL High Availability - Replication Database  (0) 2021.07.25
How to replace &amp; to & in SQL?  (0) 2019.07.16
[MySQL] 레코드 데이터 치환하기 (REPLACE)  (0) 2019.05.07
innodb 설치 및 옵션  (0) 2018.05.17
728x90

How To Optimize WordPress Performance With MySQL Replication On Ubuntu 14.04

Introduction

In this tutorial, we will teach you how to scale up your WordPress MySQL database server setup using master-slave database replication and the HyperDB plugin for WordPress. Adding more database servers to your environment in this manner allows your WordPress application to read from multiple database servers, increasing read performance.

MySQL replication reaps the most performance benefits for a system that processes frequent reads and infrequent writes, like most WordPress installations. By using a single-master with multiple-slave setup, you can add more slaves to scale your system, until you run out of network bandwidth or your master cannot handle the update load. If you wish, you can add more than one slaves by repeating the “slave” portions of the replication sections of this tutorial.

We are assuming that your setup includes two load balanced WordPress application servers that connect to a separate MySQL database server (see the prerequisites for a tutorial on how to set that up). It is not strictly necessary to have load balanced application servers to follow this tutorial, but your MySQL database server should be separate from your application servers.

Prerequisites

Before continuing with this tutorial, you should have completed two tutorials or have a similar environment:

After following those tutorials, to set up WordPress with two load balanced web application servers and a separate database server, you should have four VPSs. Because we will be dealing with several VPSs, for reference purposes, we will call your four existing VPSs the following:

  • haproxy-www: Your HAProxy server for layer 4 load balancing your WordPress web application servers. This is the entry point into your website
  • wordpress-1: Your first WordPress web application server
  • wordpress-2: Your second WordPress web application server
  • mysql-1: Your MySQL server for WordPress

That is, your environment should look something like this:

 

 

In addition to your current environment, we will require one additional VPS during this tutorial. We will call it:

  • mysql-2: Your slave MySQL database server

Our Goal

When we are finished with this tutorial, you will have two database servers will be replicating in a master-slave configuration. Your WordPress servers will selectively write to your master and read from both your master and slave databases, by use of the HyperDB WordPress plugin. Your final environment should look something like this:

 

 

Keep in mind that you do not need to have load balanced application servers (wordpress-1/wordpress-2) to follow this tutorial, and that you can add more slave databases if you want.

Set Up MySQL Master-Slave Replication

Before we can configure our WordPress application to read from multiple database servers, we need to set up our MySQL replication.

Create MySQL Slave VPS, mysql-2

You will want to create a new VPS that will act as the MySQL slave server–for reference purposes, we will call this server mysql-2. The slave will be configured to replicate all of the databases of your master MySQL server, including your WordPress database.

On mysql-2, install the MySQL software:

sudo apt-get update sudo apt-get install mysql-server

Enter a root password for your MySQL installation. Next we will configure on our MySQL master server.

Configure Existing MySQL Server as a Master

The first step is to set up your existing MySQL database server, mysql-1, with a replication master configuration.

On mysql-1, edit the MySQL configuration file:

sudo vi /etc/mysql/my.cnf

Search for the following three lines:

<pre>
bind-address = <span class=“highlight”>mysql1private_IP</span>

server-id = 1

log_bin = /var/log/mysql/mysql-bin.log

</pre>

  • bind-address: the IP address that MySQL will listen on. This should already be set to mysql-1’s private IP address from your original setup
  • server-id: the unique server ID. Since this is the master server, we will want to leave the value as “1” and uncomment this line
  • log_bin: the location of the binary log file. The binary log is used to send data changes from the master to its slave for replication. Uncomment this line

The three lines should look like this (be sure to substitute the highlighted with database server’s private IP address):

<pre>
bind-address = <span class=“highlight”>mysql1privateIP</span>
server-id = 1
logbin = /var/log/mysql/mysql-bin.log
</pre>

Optionally, if you want to restrict the replication to the wordpress database, specifically, add the following line to your configuration (substituting the highlighted with your desired database name):

<pre>
binlogdodb = <span class=“highlight”>wordpress</span>
</pre>

Save and quit. To put these changes into effect, restart mysql with the following command:

sudo service mysql restart

Connect to to the MySQL console with the following command, then enter the password when prompted:

mysql -u root -p

Create a user that will be used by our slave MySQL servers for replication purposes. We will call this user repl. Be sure to replace the repl_password with your own, strong password. The % specifies that the source IP for this user can be anything, but you may substitute the % with the private IP address of your slave MySQL server, mysql-2, to restrict connections with this user to that particular server:

<pre>
CREATE USER ‘repl’@’<span class=“highlight”>%</span>’ IDENTIFIED BY ’<span class=“highlight”>repl_password</span>’;
GRANT REPLICATION SLAVE ON . TO 'repl’@’%’;
</pre>

Do not exit the MySQL console yet!

Export a Backup Of MySQL Master

Next, we will want to export a backup of the MySQL master database, to import into our slave database so it will be identical before we start replication. We need to lock the database so we can do a data dump. In your MySQL console on mysql-1, run this:

FLUSH TABLES WITH READ LOCK; SET GLOBAL read_only = ON; EXIT

Now, from your command shell, run the following command to export a backup of the databases on your master MySQL server to a file called masterdump.sql:

mysqldump --lock-all-tables -u root -p --all-databases > masterdump.sql

Copy your masterdump.sql file to your slave server, mysql-2, using scp:

<pre>
scp masterdump.sql <span class=“highlight”>user</span>@<span class=“highlight”>mysql2private_IP</span>:/tmp
</pre>

Enter mysql-1’s MySQL console once again:

mysql -u root -p

At the MySQL prompt, unlock your database:

SET GLOBAL read_only = OFF; UNLOCK TABLES;

Now run this statement to print out information that you will need to configure your MySQL slave:

<pre>
SHOW MASTER STATUS;
</pre> <pre> Output: +——————+———-+————–+——————+ | File | Position | BinlogDoDB | BinlogIgnoreDB |
+——————+———-+————–+——————+
| <span class=“highlight”>mysql-bin.000001</span> | <span class=“highlight”>408</span> | | | +——————+———-+————–+——————+ 1 row in set (0.00 sec) </pre>

Take a note of the values of File and Position, as you will need them when configuring your slave server. Let’s switch over to mysql-2 now.

Configure MySQL Slave

Now we will want to import the master database into our slave to synchronize them in preparation for replication.

On mysql-2, run this command to import the masterdump.sql file:

mysql -u root -p < /tmp/masterdump.sql

Next, we will set up mysql-2 as a replication slave. On mysql-2, edit the MySQL configuration file:

sudo vi /etc/mysql/my.cnf

Search for the following two lines:

<pre>
bind-address = 127.0.0.1

server-id = 1

</pre>

  • bind-address: the IP address that MySQL will listen on. Set to mysql-2’s private IP address
  • server-id: the unique server ID. Since this is the master server, change this value to 2 and uncomment this line

The two lines should look like this (be sure to substitute the highlighted with database server’s private IP address):

<pre>
bind-address = <span class=“highlight”>mysql2private_IP</span>
server-id = 2
</pre>

Save and quit. Restart MySQL to put the changes into effect:

sudo service mysql restart

Enter the MySQL console:

mysql -u root -p

Next, we will connect the slave to the master. The five following values are required:

  • MASTER_HOST: set to mysql-1’s private IP
  • MASTER_USER: set to the replication user that we created on the master, repl
  • MASTER_PASSWORD: set to repl’s password, which should be substituted with your own password
  • MASTERLOGFILE: set to the “File” listed when you ran SHOW MASTER STATUS; on your master MySQL server
  • MASTERLOGPOS: set to the “Position” listed when you ran SHOW MASTER STATUS; on your master MySQL server

The following statement connects your slave to your master server, and it requires that you substitute all of the highlighted fields with the appropriate values:

<pre>
CHANGE MASTER TO
MASTERHOST=’<span class=“highlight”>mysql1privateIP</span>’,
MASTERUSER='repl’,
MASTERPASSWORD=’<span class=“highlight”>replpassword</span>’,
MASTERLOGFILE=’<span class=“highlight”>mysql-bin.000001</span>’,
MASTERLOG_POS=<span class=“highlight”>408</span>;
</pre>

If that statement ran properly, run this command to initiate the slave connection:

START SLAVE;

Your mysql-2 server should be connected as a slave now! Run the following command to check that the replication is working:

SHOW SLAVE STATUS\G

Revoke Write Privileges From Slave Users

This is optional because the HyperDB plugin can be configured to only read from your slave database server, but you may want revoke the write privileges from your wordpressuser database users on your slave database (because updates to your slave will not be replicated to your master, if you accidentally update your slave somehow).

On mysql-2, from your MySQL console run the following statement to list your database users:

<pre>
SELECT user,host FROM mysql.user;
</pre> <pre> Output: +——————+—————-+ | user | host | +——————+—————-+ | repl | % | | wordpressuser | <span class=“highlight”>wordpress1IP</span> |
| wordpressuser | <span class=“highlight”>wordpress2IP</span> |

</pre>

You should see output similar to the above code block. You may view privileges for each user with the following command:

<pre>
SHOW GRANTS FOR <span class=“highlight”>wordpressuser</span>@<span class=“highlight”>wordpress1IP</span>;
</pre>

In this example, we have one wordpressuser for each WordPress server, so we will revoke the insert, update, and delete privileges from each of them (“wordpress” is the name of our database in this example):

<pre>
REVOKE INSERT, UPDATE, DELETE ON <span class=“highlight”>wordpress</span>.* FROM ’<span class=“highlight”>wordpressuser</span>’@’<span class=“highlight”>wordpress1privateIP</span>’;
REVOKE INSERT, UPDATE, DELETE ON <span class=“highlight”>wordpress</span>.* FROM ’<span class=“highlight”>wordpressuser</span>’@’<span class=“highlight”>wordpress2privateIP</span>’;
FLUSH PRIVILEGES;
</pre>

Now your MySQL replication setup is complete. Let’s move on to setting up WordPress to use both database servers properly.

Install and Configure HyperDB

We will use HyperDB to determine where to send updates (your master database) and read requests (your master and slave). Let’s download it to your home directory from the WordPress Plugin Directory (also install zip/unzip to unarchive it):

cd ~; wget http://downloads.wordpress.org/plugin/hyperdb.zip sudo apt-get install zip unzip hyperdb.zip

It should be unarchived to a directory called “hyperdb”, in your home directory. Copy the sample configuration file to your WordPress installation (substitute the highlighted with your WordPress installation path), and open it for editing:

<pre>
cp ~/hyperdb/db-config.php <span class=“highlight”>/var/www/example.com</span>/
vi <span class=“highlight”>/var/www/example.com</span>/db-config.php </pre>

Look for the second occurrence of DB_HOST, which should be directly after some comments that describe setting up a slave and it should look exactly like the following:

<pre>
$wpdb->adddatabase(array(
'host’ => <span class=“highlight”>DBHOST</span>, // If port is other than 3306, use host:port.
'user’ => DBUSER,
'password’ => DBPASSWORD,
'name’ => DB_NAME,
'write’ => 0, 'read’ => 1, 'dataset’ => 'global’, 'timeout’ => 0.2, )); </pre>

The first occurrence of DBHOST defines the master database server, and the second occurrence defines the slave database server (denoted by the 'write' => 0,). Replace the second occurrence of `DBHOSTwithDBSLAVE1`:

<pre>
'host’ => <span class=“highlight”>DBSLAVE1</span>, // If port is other than 3306, use host:port.
</pre>

Save and exit. Next you will want to define DB_SLAVE_1 in your wp-config.php, which HyperDB will use as a slave database host. Open wp-config.php for editing:

<pre>
vi <span class=“highlight”>/var/www/example.com</span>/wp-config.php
</pre>

Find the line that defines DB_HOST and add the following line under it, substituting your slave’s private IP address (mysql-2):

<pre>
define('DBSLAVE1’, ’<span class=“highlight”>mysql2private_IP</span>’);
</pre>

Then save and exit.

Finish the HyperDB installation by copying the db.php file to the wp-content directory in your WordPress installation, then disabling write access to it:

<pre>
cp ~/hyperdb/db.php <span class=“highlight”>/var/www/example.com</span>/wp-content/</span>
sudo chmod a-w <span class=“highlight”>/var/www/example.com</span>/wp-content/db.php </pre>

Then update the ownership of your wordpress files to their appropriate values (in this tutorial, we have been using www-data for the user/group ownership):

<pre>
sudo chown -R www-data:www-data <span class=“highlight”>/var/www/example.com</span>/
</pre>

Now your WordPress read requests will be served by both your master and slave databases, while updates will be sent to your master (which will then be replicated to your slave).

Conclusion

Now that you have completed your MySQL replication and HyperDB setup, your database environment will be able to handle increased read traffic i.e. more concurrent users! Remember that you can add more MySQL slaves if you want to scale your database serving capacity even more.

<div class=“author”>By Mitchell Anicas</div>

 

 

 

https://www.digitalocean.com/community/tutorials/how-to-optimize-wordpress-performance-with-mysql-replication-on-ubuntu-14-04

'DB' 카테고리의 다른 글

VARCHAR vs TEXT  (1) 2024.09.10
MySQL does not support ROW_NUMBER method  (0) 2023.04.06
How to replace &amp; to & in SQL?  (0) 2019.07.16
[MySQL] 레코드 데이터 치환하기 (REPLACE)  (0) 2019.05.07
innodb 설치 및 옵션  (0) 2018.05.17
728x90

I have some bad data in my database table. I want to replace all &amp; or &amp;amp; or &amp;amp;amp or &amp;amp;amp;amp to & only.

In java it is working fine. how to do in SQL?

 

Java:

String[] names = new String[] { "Ravi Suthar", "Ravi &amp; Suthar", "Ravi &amp;amp; Suthar", "Ravi &amp;amp;amp; Suthar", "Ravi &amp;amp;amp;amp; Suthar" }; for (String name : names) { System.out.println(name.replaceAll("&amp;[amp;]*", "&")); }

 

SQL:

UPDATE tablename SET columnname=REPLACE(columnname,'&amp;[amp;]*','&');

 

 

 

 

 

Following sql will replace &amp; or &amp;amp; or &amp;amp;amp or &amp;amp;amp;amp or its sequence to &

UPDATE tablename SET columnname = REPLACE(REPLACE(columnname, '&amp;', '&'), 'amp;', '');

or

UPDATE tablename SET columnname = REPLACE(columnname , 'amp;', '')

+ Recent posts