Deploy ReadySet with Kubernetes¶
This page shows you how to run ReadySet with Kubernetes on Amazon EKS in front of an Amazon RDS Postgres or MySQL database.
Tip
If you don't want to run and manage ReadySet yourself, get a fully-managed deployment on ReadySet Cloud.
Before you begin¶
-
Note that this tutorial covers the scale-out deployment pattern, with the ReadySet Server and Adapter running as separate processes on separate machines.
-
Make sure you have an Amazon RDS for Postgres database running Postgres 13 or 14.
If you want to integrate with another version of Postgres, please contact ReadySet.
-
Make sure there are no DDL statements in progress.
ReadySet will take an initial snapshot of your data. Until the entire snapshot is finished, which can take between a few minutes to several hours depending on the size of your dataset, DDL statements (e.g.,
ALTER
andDROP
) against tables in your snapshot will be blocked. -
Make sure tables without primary keys have
REPLICA IDENTITY FULL
.If the database you want ReadySet to replicate includes tables without primary keys, make sure you alter those tables with
REPLICA IDENTITY FULL
before connecting ReadySet. Otherwise, Postgres will block writes and deletes on those tables. -
Make sure row-level security is disabled. ReadySet does not currently support row-level security.
-
Complete the steps described in the EKS Getting Started documentation.
This includes installing and configuring
eksctl
, the command-line tool for creating and deleting Kubernetes clusters on EKS, andkubectl
, the command-line tool for managing Kubernetes from your workstation. -
Make sure you meet the EKS requirements for using an existing VPC.
For efficient networking and security, you'll deploy your Kubernetes cluster into the same VPC as your database.
-
Note that this tutorial covers the scale-out deployment pattern, with the ReadySet Server and Adapter running as separate processes on separate machines.
-
Make sure you have an Amazon RDS for MySQL database.
ReadySet can be run in front of other versions of Postgres and MySQL. However, this tutorial focuses on RDS.
-
Make sure there are no DDL statements in progress.
ReadySet will take an initial snapshot of your data. Until the entire snapshot is finished, which can take between a few minutes to several hours depending on the size of your dataset, DDL statements (e.g.,
ALTER
andDROP
) against tables in your snapshot will be blocked.INSERT
andUPDATE
statements will also be blocked, but only while a given table is being snapshotted. -
Complete the steps described in the EKS Getting Started documentation.
This includes installing and configuring
eksctl
, the command-line tool for creating and deleting Kubernetes clusters on EKS, andkubectl
, the command-line tool for managing Kubernetes from your workstation. -
Make sure you meet the EKS requirements for using an existing VPC.
For efficient networking and security, you'll deploy your Kubernetes cluster into the same VPC as your database.
Step 1. Start Kubernetes¶
In this step, you'll create a Kubernetes cluster on Amazon EKS in the same VPC as your database. Your cluster will contain 3 nodes to accommodate a simple ReadySet deployment of one ReadySet Server, one ReadySet Adapter, and one instance of Consul.
For more demanding workloads, ReadySet can be run with multiple Adapters. Please reach out to ReadySet for guidance.
-
Identify the subnets in your database's VPC:
- In the RDS Console, select your database.
- Under Connectivity & security, note the Subnets.
-
Identify the region where your database is running:
- In the RDS Console, select your database.
- Under Summary, note the region portion of Region & AZ. For example,
us-east-1
is the region portion ofus-east-1f
.
-
From your local workstation, create a Kubernetes cluster, replacing the
<db-region
and<db-subnet>
placeholders with the details from the previous steps:eksctl create cluster \ --name=readyset \ --region=<db-region> \ --nodegroup-name=standard-workers \ --nodes=3 \ --node-type=c5.2xlarge \ --node-private-networking \ --vpc-private-subnets=<db-subnet1>,<db-subnet2>,<db-subnet3>,...
Flag Description --name
The name of the cluster. --region
The region where your database is running and where you will run your EKS cluster. It is necessary to run your cluster in the same region as your database in order to have access to the same VPC. --nodegroup-name
The name of the node group for the cluster.
This node group should be created automatically along with the cluster. However, if cluster creation fails because a node group could not be created, you will need to create a managed node group manually and reference it in the
--nodegroup-name
flag.--nodes
The number of nodes in the cluster.
3 is the minimum required for a simple ReadySet deployment of one ReadySet Server, one ReadySet Adapter, and one instance of Consul.
--node-type
The instance type to use for the nodes.
The
c5.2xlarge
type is fine for testing ReadySet; however, ReadySet is a memory-intensive application, so you should use memory-optimized instances (r5.2xlarge
or larger) for production deployments.--vpc-private-subnets
The subnets of your database's VPC.
If you do not want to create the cluster in the same VPC as your database (e.g., you plan to set up VPC peering between Kubernetes and the database), remove this flag and
--node-private-networking
.Cluster provisioning usually takes between 10 and 15 minutes. Do not move on to the next step until you see a message like
[✔] EKS cluster "readyset" in "us-east-1" region is ready
and details about your cluster.Tip
If cluster creation fails, you may need to create a managed node group manually and reference it in the
--nodegroup-name
flag. -
Check that you can connect to the database from your EKS cluster.
-
In your EKS cluster, create a temporary pod containing the
psql
client: -
Start
psql
, replacing placeholders with your database connection details:PGPASSWORD=<password> psql \ --host=<database_endpoint> \ --port=<port> \ --username=<username> \ --dbname=<database_name>
Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
You should now be in the SQL shell, where you can query your database.
Warning
If you can't connect, there are likely errors in the
psql
connection details or in your VPC configuration. Review these details, fix any errors, and try runningpsql
again.Do not move on to the next step until you successfully connect from your EKS cluster; if you can't do so now, ReadySet won't be able to connect later.
-
Stop
psql
and delete the temporary pod:
-
In your EKS cluster, create a temporary pod containing the
mysql
client: -
Start
mysql
, replacing placeholders with your database connection details:mysql \ --host=<database_endpoint> \ --port=<port> \ --user=<username> \ --password=<password> \ --database=<database_name>
Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
You should now be in the SQL shell, where you can query your database.
Warning
If you can't connect, there are likely errors in the
mysql
connection details or in your VPC configuration. Review these details, fix any errors, and try runningmysql
again.Do not move on to the next step until you successfully connect from your EKS cluster; if you can't do so now, ReadySet won't be able to connect later.
-
Stop
mysql
and delete the temporary pod:
-
-
Create a Kubernetes secret with your database connection details. ReadySet will use this secret to connect to the database.
-
Set environment variables with your database connection details:
Replication scope
By default, ReadySet will replicate all tables in all schemas of the database specified in
DB_NAME
. If the queries you want to cache with ReadySet touch only a specific schema or specific tables in a schema, you can restrict the scope of replication accordingly. See Step 4 for more details.Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
-
Create the secret:
-
Set environment variables with your database connection details:
Replication scope
By default, ReadySet will replicate all tables in the database specified in
DB_NAME
. If the queries you want to cache with ReadySet touch only specific tables in the database, you can restrict the scope of replication accordingly. See Step 4 for more details.Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
-
Create the secret:
-
Step 2. Set up load balancing¶
In this step, you'll install an AWS Network Load Balancer Controller into your Kubernetes cluster. When you deploy ReadySet with the Helm chart, Kubernetes will use this Controller to provision a load balancer for your deployment. The load balancer will be able to handle queries sent to ReadySet from outside of the Kubernetes cluster.
-
Complete the installation steps described in the AWS Network Load Balancer Controller documentation.
-
Verify that the network load balancer controller is installed:
Step 3. Configure your database¶
In this step, you'll configure your database so that ReadySet can consume the database's replication stream, which ReadySet uses to keep its cache up-to-date as the database changes.
-
In your EKS cluster, create a temporary pod containing the
psql
client: -
Start
psql
, replacing placeholders with your database connection details:PGPASSWORD=<password> psql \ --host=<database_endpoint> \ --port=<port> \ --username=<username> \ --dbname=<database_name>
Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
-
In the
psql
shell, check if replication is enabled:If replication is already on, skip to Step 4. Start ReadySet:
If replication is off, continue to the next step:
-
Create a custom parameter group.
- For Parameter group family, select the Postgres version of your database.
- For Type, select DB Parameter Group.
- Give the group a name and description.
-
Edit the new parameter group and set the
rds.logical_replication
parameter to1
. -
Associate the parameter group to your database.
-
Be sure to use the Apply Immediately option. The database must be rebooted in order for the parameter group association to take effect.
-
Do not move on to the next step until the database Status is Available in the RDS Console.
-
-
Back in the SQL shell, verify that replication is now enabled:
Note
If replication is still not enabled, reboot the database.
Once the database Status is Available in the RDS Console, check replication again.
-
Stop
psql
and delete the temporary pod:
-
In RDS MySQL, replication is enabled only when automated backups are also enabled. If you didn't enable automated backups when creating your database instance, enable automated backups now.
-
Be sure to use the Apply Immediately option. The database must be rebooted in order for the change to take effect.
-
Do not move on to the next step until the database Status is Available in the RDS Console.
-
-
In your EKS cluster, create a temporary pod containing the
mysql
client: -
Start
mysql
, replacing placeholders with your database connection details:mysql \ --host=<database_endpoint> \ --port=<port> \ --user=<username> \ --password=<password> \ --database=<database_name>
Tip
To find the database endpoint, select your database in the RDS Console, and look under Connectivity & security.
-
In the
mysql
shell, verify that replication is enabled:+---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | ON | +---------------+-------+ 1 row in set (0.00 sec)
Note
If replication is still not enabled, reboot the database.
Once the database Status is Available in the RDS Console, check replication again.
-
Check the binary logging format:
If the binary logging format is
ROW
, skip to Step 4. Start ReadySet:+---------------+-------+ | Variable_name | Value | +---------------+-------+ | binlog_format | ROW | +---------------+-------+ 1 row in set (0.00 sec)
If the binary logging format is not
ROW
, continue to the next step: -
Create a custom parameter group.
- For Parameter group family, select the MySQL version of your database.
- For Type, select DB Parameter Group.
- Give the group a name and description.
-
Edit the new parameter group and set the
binlog_format
parameter toROW
. -
Associate the parameter group to your database.
-
Be sure to use the Apply Immediately option. The database must be rebooted in order for the parameter group association to take effect.
-
Do not move on to the next step until the database Status is Available in the RDS Console.
-
-
Back in the SQL shell, verify that the binary logging format is
ROW
:+---------------+-------+ | Variable_name | Value | +---------------+-------+ | binlog_format | ROW | +---------------+-------+ 1 row in set (0.00 sec)
Note
If the binary logging format is still not
ROW
, reboot the database.Once the database Status is Available in the RDS Console, check the binary logging format again.
-
Stop
msql
and delete the temporary pod:
Step 4. Configure ReadySet¶
In this step, you'll download and edit the configuration files for deploying ReadySet.
-
Clone the
readyset
GitHub repository: -
Move into to the
readysettech/readyset/helm/readyset
directory. This directory contains theChart.yaml
andvalues.yaml
files that Helm needs to deploy ReadySet. -
Edit the
values.yaml
file as follows:-
Choose a unique identifier for your ReadySet deployment:
-
Change the image tags for the ReadySet Server and Adapter from
latest
to the latest release of the ReadySet Server and Adapter (e.g.,beta-2023-01-18
):Note
The
latest
docker image tag is updated nightly and so represents different versions of ReadySet over time. To ensure that you deploy a fixed version of ReadySet, it's important to use the tag for a specific version. It's also important the same tag for the ReadySet Server and Adapter. -
Configure your deployment to use the ReadySet Adapter for your database:
-
Change the storage size to be 2x the size of your database:
volumeClaimTemplates: - metadata: name: state spec: storageClassName: gp2 accessModes: - ReadWriteOnce resources: requests: storage: 250Gi
Note
The
values.yaml
file contains the CPU, memory, and storage specifications for the components of your deployment. The default values are suitable for testing purposes only. For production deployments, you'll need to substitute values that are appropriate for your database and workload. Please reach out to ReadySet for guidance. -
Set environment variables to disable verification of SSL certifications on the ReadySet Server and ReadySet adapter. This is necessary because ReadySet cannot currently verify Amazon's self-signed certificates.
-
By default, ReadySet will replicate all data in the database specified in the ReadySet secret that you created earlier. However, if the queries you want to cache with ReadySet access only a subset of tables in the database, you can set the
REPLICATION_TABLES
environment variable to narrow the scope accordingly. Filtering out tables that will not be used in caches will speed up the snapshotting process.
-
Step 5. Start ReadySet¶
In this step, you'll use the Helm package manager to deploy ReadySet into your EKS cluster.
-
Use the ReadySet Helm chart to deploy ReadySet to your EKS cluster:
-
Confirm that the ReadySet deployment completed successfully, with the pods for the ReadySet Adapter, ReadySet Server, and Consul showing
Running
underSTATUS
:NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES readyset-consul-server-0 1/1 Running 0 5m 192.168.39.169 ip-192-168-43-246.ec2.internal <none> <none> readyset-readyset-adapter-9dbfb77d9-ml92h 2/2 Running 0 5m 192.168.48.46 ip-192-168-43-246.ec2.internal <none> <none> readyset-readyset-server-0 2/2 Running 0 5m 192.168.18.133 ip-192-168-18-84.ec2.internal <none> <none>
-
Confirm that the persistent volumes for storing ReadySet's snapshot of your database and for ReadySet state details were created successfully:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-d792b6a8-35ae-456d-8c92-415e473931dc 10Gi RWO Delete Bound default/data-default-readyset-consul-server-0 gp2 5m pvc-ddf75696-9eb7-4e28-a846-2110e889c8de 250Gi RWO Delete Bound default/state-readyset-readyset-server-0 gp2 5m
-
Confirm that a load balancer service was created successfully:
Do not move on to the next step until anNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE readyset-readyset-adapter LoadBalancer 10.100.46.222 k8s-default-readyset-3cab417124-2b191c9917ce4d43.elb.us-east-1.amazonaws.com 3306:30336/TCP,5432:30185/TCP 5m
EXTERNAL-IP
has been assigned to the load balancer. This may take a few minutes.
Step 6. Check snapshotting¶
As soon as ReadySet is connected to the database, it starts storing a snapshot of your database tables on disk. This snapshot will be the basis for ReadySet to cache query results, and ReadySet will keep its snapshot and cache up-to-date automatically by listening to the database's replication stream. Queries can be cached in ReadySet only once all tables have finished the initial snapshotting process.
In this step, you'll check the status of the snapshotting process. Snapshotting can take between a few minutes to several hours, depending on the size of your dataset.
-
In your EKS cluster, create a temporary pod containing the
psql
client: -
Start
psql
, replacing the--host
placeholder with the external IP of your load balancer, and replacing the other placeholders with your database connection details:PGPASSWORD=<password> psql \ --host=<external IP of load balancer> \ --port=<port> \ --username=<username> \ --dbname=<database_name>
You should now be in the SQL shell.
-
Use ReadySet's custom
SHOW READYSET TABLES
command to check the snapshotting status of tables in the database ReadySet is connected to:table | status ------------------------------------------ `public`.`title_basics` | Snapshotting `public`.`title_ratings` | Snapshotted `public`.`title_episodes` | Not Replicated (3 rows)
There are 3 possible statuses:
- Snapshotting: The initial snapshot of the table is in progress.
- Snapshotted: The initial snapshot of the table is complete. ReadySet is replicating changes to the table via the database's replication stream.
- Not Replicated: The table has not been snapshotted by ReadySet. This can be because ReadySet encountered an error (e.g., due to unsupported data types) or the table has been intentionally excluded from snapshotting (via the
--replication-tables
option).
Info
You can start caching queries in ReadySet only once all tables with the
Snapshotting
status have finished snapshotting and show theSnapshotted
status. -
If you'd like to track snapshotting progress in greater detail, exit the temporary pod, and then check the ReadySet logs:
Note
For each table, you'll see the progress and the estimated time remaining in the log messages (e.g.,
progress=84.13% estimate=00:00:23
).2022-12-13T16:02:48.142605Z INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting table context=LogContext({"deployment": "readyset-helm-test"}) 2022-12-13T16:02:48.202895Z INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting table context=LogContext({"deployment": "readyset-helm-test"}) 2022-12-13T16:02:48.357445Z INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting started context=LogContext({"deployment": "readyset-helm-test"}) rows=1246402 2022-12-13T16:02:48.921839Z INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting started context=LogContext({"deployment": "readyset-helm-test"}) rows=5159701 2022-12-13T16:03:11.155418Z INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting finished context=LogContext({"deployment": "readyset-helm-test"}) rows_replicated=1246402 2022-12-13T16:03:19.927790Z INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting progress context=LogContext({"deployment": "readyset-helm-test"}) rows_replicated=1126400 progress=21.83% estimate=00:01:51 ...
Tip
To follow the full ReadySet Server logs, use:
To follow the ReadySet Adapter logs, use:
-
In your EKS cluster, create a temporary pod containing the
mysql
client: -
Start
mysql
, replacing the--host
placeholder with the external IP of your load balancer, and replacing the other placeholders with your database connection details:mysql \ --host=<external IP of load balancer> \ --port=<port> \ --user=<username> \ --password=<password> \ --database=<database_name>
You should now be in the SQL shell.
-
Use ReadySet's custom
SHOW READYSET TABLES
command to check the snapshotting status of tables in the database ReadySet is connected to:table | status ------------------------------------------ `public`.`title_basics` | Snapshotting `public`.`title_ratings` | Snapshotted `public`.`title_episodes` | Not Replicated (3 rows)
There are 3 possible statuses:
- Snapshotting: The initial snapshot of the table is in progress.
- Snapshotted: The initial snapshot of the table is complete. ReadySet is replicating changes to the table via the database's replication stream.
- Not Replicated: The table has not been snapshotted by ReadySet. This can be because ReadySet encountered an error (e.g., due to unsupported data types) or the table has been intentionally excluded from snapshotting (via the
--replication-tables
option).
Info
You can start caching queries in ReadySet only once all tables with the
Snapshotting
status have finished snapshotting and show theSnapshotted
status. -
If you'd like to track snapshotting progress in greater detail, exit the temporary pod, and then check the ReadySet logs:
Note
For each table, you'll see the progress and the estimated time remaining in the log messages (e.g.,
progress=84.13% estimate=00:00:23
).2022-10-18T17:18:01.685613Z INFO taking database snapshot: replicators::noria_adapter: Starting snapshot 2022-10-18T17:18:01.803163Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Acquiring read lock table=`readyset`.`users` 2022-10-18T17:18:01.807475Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replicating table table=`readyset`.`users` 2022-10-18T17:18:01.809739Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Read lock released table=`readyset`.`users` 2022-10-18T17:18:01.810049Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Acquiring read lock table=`readyset`.`posts` 2022-10-18T17:18:01.816496Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replicating table table=`readyset`.`posts` 2022-10-18T17:18:01.818721Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Read lock released table=`readyset`.`posts` 2022-10-18T17:18:01.822144Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replication started rows=4990 table=`readyset`.`users` 2022-10-18T17:18:01.822376Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replication started rows=5000 table=`readyset`.`posts` 2022-10-18T17:18:01.863220Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replication finished rows_replicated=4990 table=`readyset`.`users` 2022-10-18T17:18:01.864316Z INFO taking database snapshot:replicating table: replicators::mysql_connector::snapshot: Replication finished rows_replicated=5000 table=`readyset`.`posts` 2022-10-18T17:18:01.966256Z INFO taking database snapshot: replicators::noria_adapter: Snapshot finished
Tip
To follow the full ReadySet Server logs, use:
To follow the ReadySet Adapter logs, use:
Next steps¶
-
Set up monitoring
The ReadySet Server and ReadySet Adapter export granular time series metrics at
<adapter IP or host>:6033/prometheus>
and<server IP or host>:6034/prometheus
, respectively. The metrics are formatted for easy integration with Prometheus, an open source tool you can use to for storing, aggregating, and querying time series data. You can use this data to, for example, profile SQL query latencies and identify queries to cache with ReadySet.Viewing Prometheus metrics
To view the Prometheus metrics exported by the ReadySet Adapter:
-
Get the IP of the ReadySet Adapter pod:
-
Create a temporary pod containing the
curl
command: -
Make the
GET
requests to the Prometheus endpoint:
To view the Prometheus metrics exported by the ReadySet Server:
-
Get the IP of the ReadySet Server pod:
-
Create a temporary pod containing the
curl
command: -
Make the
GET
requests to the Prometheus endpoint:
-
-
Cache queries
Once you've identified queries to cache, use ReadySet's custom SQL commands to check if ReadySet supports them and then cache them in ReadySet.
Note
To successfully cache the results of a query, ReadySet must support the SQL features and syntax in the query. For more details, see SQL Support. If an unsupported feature is important to your use case, submit a feature request.