Skip to content

Quickstart with ReadySet

This tutorial shows you the quickest way to get started with ReadySet.

  • First, you'll use Docker Compose to start Postgres, load some sample data, and connect ReadySet to the database.
  • Next, you'll check on ReadySet's snapshotting process, cache some queries, and then run a simple Python app to compare how fast ReadySet returns results compared to Postgres.
  • Finally, you'll write to the database and see how ReadySet keeps your cache up-to-date automatically, with no changes to your application code.

Tip

To run through this quickstart in your browser, go the ReadySet Playground.

Before you begin

Step 1. Start ReadySet

In this step, you'll use Docker Compose to start Postgres, load some sample data, and connect ReadySet to the database.

  1. Download the Docker Compose and sample data files:

    curl -O "https://raw.githubusercontent.com/readysettech/docs/main/docs/assets/{docker-compose-postgres.yml,imdb-postgres.sql}"
    
  2. Run Docker Compose:

    docker-compose -f docker-compose-postgres.yml up -d
    

    Compose then does the following:

    • Starts Postgres in a container called db and imports two tables from the IMDb dataset.
    • Starts ReadySet in a container called cache. For details about the CLI options used to start ReadySet, see the CLI reference docs.
    • Creates a container called app for running a sample Python app against ReadySet.

Step 2. Check snapshotting

As soon as ReadySet is connected to the database, it starts storing a snapshot of your database tables on disk. This snapshot will be the basis for ReadySet to cache query results, and ReadySet will keep its snapshot and cache up-to-date automatically by listening to the database's replication stream. Queries can be cached in ReadySet only once all tables have finished the initial snapshotting process.

Snapshotting can take between a few minutes to several hours, depending on the size of your dataset. In this tutorial, snapshotting should take no more than a few minutes. Check the status of snapshotting, and make sure it's complete, before continuing to the next step.

  1. Connect the psql shell to ReadySet, using the port that ReadySet is listening on, 5433:

    PGPASSWORD=readyset psql \
    --host=127.0.0.1 \
    --port=5433 \
    --username=postgres \
    --dbname=imdb
    
  2. Run ReadySet's custom SHOW READYSET TABLES command to check the snapshotting status of tables in the database ReadySet is connected to:

    SHOW READYSET TABLES;
    
            table            |   status
    -------------------------+------------
    `public`.`title_basics`  | Snapshotted
    `public`.`title_ratings` | Snapshotted
    (2 rows)
    

    There are 3 possible statuses:

    • Snapshotting: The initial snapshot of the table is in progress.
    • Snapshotted: The initial snapshot of the table is complete. ReadySet is replicating changes to the table via the database's replication stream.
    • Not Replicated: The table has not been snapshotted by ReadySet. This can be because ReadySet encountered an error (e.g., due to unsupported data types) or the table has been intentionally excluded from snapshotting (via the --replication-tables option).
  3. If you'd like to track snapshotting progress in greater detail, open a new terminal, and check the ReadySet logs:

    docker logs readyset | grep 'Snapshotting table'
    
    2023-02-13T17:02:59.189523Z  INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting table context=LogContext({"deployment": "docs_quickstart_postgres"})
    2023-02-13T17:02:59.240229Z  INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting table context=LogContext({"deployment": "docs_quickstart_postgres"})
    2023-02-13T17:02:59.331648Z  INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting started context=LogContext({"deployment": "docs_quickstart_postgres"}) rows=396793
    2023-02-13T17:02:59.341948Z  INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting started context=LogContext({"deployment": "docs_quickstart_postgres"}) rows=201258
    2023-02-13T17:03:02.115769Z  INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting finished context=LogContext({"deployment": "docs_quickstart_postgres"}) rows_replicated=201258
    2023-02-13T17:03:09.837387Z  INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting finished context=LogContext({"deployment": "docs_quickstart_postgres"}) rows_replicated=396793
    

    Again, don't continue to the next step until you see Snapshotting finished for both title_ratings and title_basics:

    docker logs readyset | grep 'Snapshotting finished'
    
    2023-02-13T17:03:02.115769Z  INFO Snapshotting table{table=`public`.`title_ratings`}: replicators::postgres_connector::snapshot: Snapshotting finished context=LogContext({"deployment": "docs_quickstart_postgres"}) rows_replicated=201258
    2023-02-13T17:03:09.837387Z  INFO Snapshotting table{table=`public`.`title_basics`}: replicators::postgres_connector::snapshot: Snapshotting finished context=LogContext({"deployment": "docs_quickstart_postgres"}) rows_replicated=396793
    

Step 3. Cache queries

With snapshotting finished, ReadySet is ready for caching, so in this step, you'll get to know the dataset, run some queries, check if ReadySet supports them, and then cache them.

  1. If necessary, reconnect the psql shell to ReadySet:

    PGPASSWORD=readyset psql \
    --host=127.0.0.1 \
    --port=5433 \
    --username=postgres \
    --dbname=imdb
    
  2. The imdb database contains two modified tables from the IMDb dataset, title_basics and title_ratings. Get a sense of the data in each table:

    SELECT count(*) FROM title_basics;
    SELECT * FROM title_basics WHERE tconst = 'tt0093779';
    
      count
     --------
      396793
     (1 row)
    
       tconst   | titletype |    primarytitle    |   originaltitle    | isadult | startyear | endyear | runtimeminutes |          genres
     -----------+-----------+--------------------+--------------------+---------+-----------+---------+----------------+--------------------------
      tt0093779 | movie     | The Princess Bride | The Princess Bride | f       |      1987 |         |             98 | Adventure,Family,Fantasy
     (1 row)
    
    SELECT count(*) FROM title_ratings;
    SELECT * FROM title_ratings WHERE tconst = 'tt0093779';
    
      count
     --------
      201258
     (1 row)
    
       tconst   | averagerating | numvotes
     -----------+---------------+----------
      tt0093779 |           8.0 |   427192
     (1 row)    
    
  3. Run a query that joins results from title_ratings and title_basics to count how many titles released in 2000 have an average rating higher than 5:

    SELECT count(*) FROM title_ratings
    JOIN title_basics ON title_ratings.tconst = title_basics.tconst
    WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5;
    
      count
     -------
       2418
     (1 row)
    
  4. Because the query is not yet cached, ReadySet proxied it to the upstream database. Use ReadySet's custom SHOW PROXIED QUERIES command to check if ReadySet can cache the query:

    SHOW PROXIED QUERIES;
    
         query id      |                                                                                            proxied query                                                                                             | readyset supported
    -------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------
    q_2f1af226d10ee188 | SELECT * FROM `title_ratings` WHERE (`tconst` = $1)                                                                                                                                                  | yes
    q_19b5998c761fd61d | SELECT count(*) FROM `title_basics`                                                                                                                                                                  | yes
    q_5348e5187dbf1722 | SELECT count(*) FROM `title_ratings` JOIN `title_basics` ON (`title_ratings`.`tconst` = `title_basics`.`tconst`) WHERE ((`title_basics`.`startyear` = $1) AND (`title_ratings`.`averagerating` > 5)) | yes
    q_e31fbaf56a443b93 | SELECT count(*) FROM `title_ratings`                                                                                                                                                                 | yes
    q_e948c4c2f747b1a7 | SELECT * FROM `title_basics` WHERE (`tconst` = $1)                                                                                                                                                   | yes
    (5 rows)
    

    You should see yes under readyset supported. If the value is pending, check again until you see yes or no.

    Tip

    To successfully cache the results of a query, ReadySet must support the SQL features and syntax in the query. For more details, see SQL Support.

  5. Cache the query in ReadySet:

    CREATE CACHE FROM
    SELECT count(*) FROM title_ratings
    JOIN title_basics ON title_ratings.tconst = title_basics.tconst
    WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5;
    

    Tip

    To cache a query, you can provide either the full SELECT (as shown here) or the query ID listed in the SHOW PROXIED QUERIES output.

    Note

    Caching will take a few moments, as it constructs the initial dataflow graph for the query and adds indexes to the relevant ReadySet table snapshots, as necessary. The CREATE CACHE command will return once this is complete.

  6. Run a second query, this time joining results from your two tables to get the title and average rating of the 10 top-rated movies from 1950:

    SELECT title_basics.originaltitle, title_ratings.averagerating
    FROM title_basics
    JOIN title_ratings ON title_basics.tconst = title_ratings.tconst
    WHERE title_basics.startyear = 1950 AND title_basics.titletype = 'movie'
    ORDER BY title_ratings.averagerating DESC
    LIMIT 10;
    
                 originaltitle             | averagerating
      -------------------------------------+---------------
      Le mariage de Mademoiselle Beulemans |           9.0
      Es kommt ein Tag                     |           8.7
      Nili                                 |           8.7
      Sudhar Prem                          |           8.7
      Pyar                                 |           8.6
      Jiruba Tetsu                         |           8.5
      Meena Bazaar                         |           8.5
      Vidyasagar                           |           8.4
      Siete muertes a plazo fijo           |           8.4
      Tathapi                              |           8.4
      (10 rows)
    
  7. Use the SHOW PROXIED QUERIES command to check if ReadySet can cache the query:

    SHOW PROXIED QUERIES;
    
         query id      |                                                                                                                                             proxied query                                                                                                                                             | readyset supported
    -------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------
    q_8dba9d9bee766310 | SELECT `title_basics`.`originaltitle`, `title_ratings`.`averagerating` FROM `title_basics` JOIN `title_ratings` ON (`title_basics`.`tconst` = `title_ratings`.`tconst`) WHERE ((`title_basics`.`startyear` = $1) AND (`title_basics`.`titletype` = $2)) ORDER BY `title_ratings`.`averagerating` DESC | yes
    q_2f1af226d10ee188 | SELECT * FROM `title_ratings` WHERE (`tconst` = $1)                                                                                                                                                                                                                                                   | yes
    q_19b5998c761fd61d | SELECT count(*) FROM `title_basics`                                                                                                                                                                                                                                                                   | yes
    q_e31fbaf56a443b93 | SELECT count(*) FROM `title_ratings`                                                                                                                                                                                                                                                                  | yes
    q_e948c4c2f747b1a7 | SELECT * FROM `title_basics` WHERE (`tconst` = $1)                                                                                                                                                                                                                                                    | yes
    (5 rows)
    

    You should see yes under readyset supported. If the value is pending, check again until you see yes or no.

  8. Cache the query in ReadySet:

    CREATE CACHE FROM
    SELECT title_basics.originaltitle, title_ratings.averagerating
    FROM title_basics
    JOIN title_ratings ON title_basics.tconst = title_ratings.tconst
    WHERE title_basics.startyear = 1950 AND title_basics.titletype = 'movie'
    ORDER BY title_ratings.averagerating DESC
    LIMIT 10;
    

    Again, caching will take a few moments.

  9. Use ReadySet's custom SHOW CACHES command to verify that caches have been created for your two queries:

    SHOW CACHES;
    
        name         |                                                                                                                                                                                         query                                                                                                                                                                                          | fallback behavior
    ----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------
    `q_5348e5187dbf1722` | SELECT count(coalesce(`public`.`title_ratings`.`tconst`, '<anonymized>')) FROM `public`.`title_ratings` JOIN `public`.`title_basics` ON (`public`.`title_ratings`.`tconst` = `public`.`title_basics`.`tconst`) WHERE ((`public`.`title_basics`.`startyear` = $1) AND (`public`.`title_ratings`.`averagerating` > '<anonymized>'))                                                      | fallback allowed
    `q_8dba9d9bee766310` | SELECT `public`.`title_basics`.`originaltitle`, `public`.`title_ratings`.`averagerating` FROM `public`.`title_basics` JOIN `public`.`title_ratings` ON (`public`.`title_basics`.`tconst` = `public`.`title_ratings`.`tconst`) WHERE ((`public`.`title_basics`.`startyear` = $1) AND (`public`.`title_basics`.`titletype` = $2)) ORDER BY `public`.`title_ratings`.`averagerating` DESC | fallback allowed
    (2 rows)
    
  10. Exit the psql shell:

    \q
    

Step 4. Run an app

Now you'll use a simple Python application to run your queries against both the database and ReadySet and compare how fast results are returned.

  1. Get into the Python container created by the Docker Compose config and install some dependencies for running the app:

    docker exec -it app bash
    
    apt-get update \
    && apt-get -y install libpq-dev gcc curl \
    && pip3 install psycopg2 numpy urllib3 tabulate
    
  2. Download the Python app:

    curl -O https://raw.githubusercontent.com/readysettech/docs/main/docs/assets/quickstart-app.py
    

    The Python app runs a specified query 20 times and prints the latency of each iteration as well as the query latency distributions (50th, 90th, 95th, 99th, and 100th percentiles). Feel free to review the code:

    cat quickstart-app.py
    
  3. Run the first JOIN query against Postgres:

    python3 quickstart-app.py \
    --url="postgresql://postgres:readyset@postgres:5432/imdb?sslmode=disable" \
    --query="SELECT count(*) FROM title_ratings JOIN title_basics ON title_ratings.tconst = title_basics.tconst WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5;"
    
    Result:
    ['count']
    ['2418']
    
    Query latencies (in milliseconds):
    ['63.18', '34.35', '32.47', '31.56', '30.43', '30.23', '30.07', '30.31', '30.23', '30.17', '30.28', '30.56', '30.45', '30.42', '30.25', '30.15', '30.20', '30.08', '30.04', '31.69']
    
    Latency percentiles (in milliseconds):
     p50: 30.29
     p90: 32.66
     p95: 35.79
     p99: 57.70
    p100: 63.18
    

    Note the latencies when results are returned from the database. Also note that these latencies would be higher with a larger dataset.

  4. Run the same JOIN again, but this time against ReadySet:

    Tip

    Changing your connection string is the only change you make to your application. In this case, you're just changing the host and port from postgres:5432 to readyset:5433.

    python3 quickstart-app.py \
    --url="postgresql://postgres:readyset@readyset:5433/imdb?sslmode=disable" \
    --query="SELECT count(*) FROM title_ratings JOIN title_basics ON title_ratings.tconst = title_basics.tconst WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5;"
    
    Result:
    ['count(coalesce(`public`.`title_ratings`.`tconst`, 0))']
    ['2418']
    
    Query latencies (in milliseconds):
    ['7.93', '1.09', '0.46', '0.37', '0.41', '0.45', '0.61', '0.38', '0.26', '0.24', '0.24', '0.31', '0.25', '0.25', '0.29', '0.23', '0.23', '0.23', '0.24', '0.22']
    
    Latency percentiles (in milliseconds):
     p50: 0.28
     p90: 0.66
     p95: 1.44
     p99: 6.63
    p100: 7.93
    

    As you can see, ReadySet returns results much faster. In the example here, the p50 latency went from 30.29ms to 0.28ms.

  5. Now run the second JOIN query against Postgres:

    python3 quickstart-app.py \
    --url="postgresql://postgres:readyset@postgres:5432/imdb?sslmode=disable" \
    --query="SELECT title_basics.originaltitle, title_ratings.averagerating FROM title_basics JOIN title_ratings ON title_basics.tconst = title_ratings.tconst WHERE title_basics.startyear = 1950 AND title_basics.titletype = 'movie' ORDER BY title_ratings.averagerating DESC LIMIT 10;"
    
    Result:
    ['originaltitle', 'averagerating']
    ['Le mariage de Mademoiselle Beulemans', '9.0']
    ['Sudhar Prem', '8.7']
    ['Es kommt ein Tag', '8.7']
    ['Nili', '8.7']
    ['Pyar', '8.6']
    ['Meena Bazaar', '8.5']
    ['Jiruba Tetsu', '8.5']
    ['Siete muertes a plazo fijo', '8.4']
    ['Showkar', '8.4']
    ['Tathapi', '8.4']
    
    Query latencies (in milliseconds):
    ['57.72', '30.14', '27.82', '27.71', '27.77', '27.49', '27.62', '27.56', '27.74', '27.69', '28.86', '29.06', '27.50', '27.69', '27.44', '27.54', '27.50', '27.42', '28.43', '27.52']
    
    Latency percentiles (in milliseconds):
     p50: 27.69
     p90: 29.17
     p95: 31.52
     p99: 52.48
    p100: 57.72
    

    Note the latencies when results are returned from the database.

  6. Run the same JOIN again, but this time against ReadySet:

    python3 quickstart-app.py \
    --url="postgresql://postgres:readyset@readyset:5433/imdb?sslmode=disable" \
    --query="SELECT title_basics.originaltitle, title_ratings.averagerating FROM title_basics JOIN title_ratings ON title_basics.tconst = title_ratings.tconst WHERE title_basics.startyear = 1950 AND title_basics.titletype = 'movie' ORDER BY title_ratings.averagerating DESC LIMIT 10;"
    
    Result:
    ['originaltitle', 'averagerating']
    ['Le mariage de Mademoiselle Beulemans', '9.0']
    ['Es kommt ein Tag', '8.7']
    ['Nili', '8.7']
    ['Sudhar Prem', '8.7']
    ['Pyar', '8.6']
    ['Jiruba Tetsu', '8.5']
    ['Meena Bazaar', '8.5']
    ['Pardes', '8.4']
    ['Showkar', '8.4']
    ['Siete muertes a plazo fijo', '8.4']
    
    Query latencies (in milliseconds):
    ['9.24', '0.57', '0.41', '0.41', '0.36', '0.39', '0.36', '0.37', '0.36', '0.38', '0.35', '0.38', '0.33', '0.37', '0.36', '0.33', '0.35', '0.34', '0.32', '0.48']
    
    Latency percentiles (in milliseconds):
     p50: 0.37
     p90: 0.49
     p95: 1.00
     p99: 7.59
    p100: 9.24
    

    ReadySet returns results much faster. In this case, p50 latency went from 27.69 ms to 0.37 ms.

    Note

    It's important to note that latencies from the database would increase with a larger dataset, whereas latencies from ReadySet would stay mostly constant, as results are returned from memory.

  7. Exit the app container:

    exit
    

Step 5. Cause a cache refresh

One of ReadySet's most important features is its ability to keep your cache up-to-date as writes are applied to the upstream database. In this step, you'll see this in action.

  1. Use psql to insert new rows that will change the count returned by your first JOIN query:

    PGPASSWORD=readyset psql \
    --host=127.0.0.1 \
    --port=5432 \
    --username=postgres \
    --dbname=imdb \
    -c "INSERT INTO title_basics (tconst, titletype, primarytitle, originaltitle, isadult, startyear, runtimeminutes, genres)
          VALUES ('tt9999998', 'movie', 'The ReadySet movie', 'The ReadySet movie', false, 2000, 0, 'Adventure');
        INSERT INTO title_ratings (tconst, averagerating, numvotes)
          VALUES ('tt9999998', 10, 1000000);"
    
  2. Get back into the app container:

    docker exec -it app bash
    
  3. Run the JOIN against ReadySet again:

    python3 quickstart-app.py \
    --url="postgresql://postgres:readyset@readyset:5433/imdb?sslmode=disable" \
    --query="SELECT count(*) FROM title_ratings JOIN title_basics ON title_ratings.tconst = title_basics.tconst WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5;"
    
    Result:
    ['count(coalesce(`public`.`title_ratings`.`tconst`, 0))']
    ['2419']
    
    Query latencies (in milliseconds):
    ['8.12', '0.52', '0.35', '0.35', '0.32', '0.29', '0.34', '0.30', '0.35', '0.60', '0.46', '0.42', '0.39', '0.41', '0.32', '0.33', '0.37', '0.33', '0.38', '0.32']
    
    Latency percentiles (in milliseconds):
     p50: 0.35
     p90: 0.53
     p95: 0.98
     p99: 6.69
    p100: 8.12
    

    Success

    Previously, the count was 2418. Now, the count is 2419, and the query latencies are virtually unchanged. This shows how ReadySet automatically updates your cache, using the database's replication stream, with no action needed on your part to keep the database and cache in sync.

  4. Exit the app container:

    exit
    

Step 6. Clean up

When you are done testing, stop and remove the resources used in this tutorial:

docker-compose -f docker-compose-postgres.yml down -v

Next steps