Change Data Capture with Neon and Materialize

Now that Neon supports logical replication, your head might be filled with questions about how to turn this new feature into interactive data apps, better customer experiences, and other use cases ripe for fresh data. Do you need to rethink your architecture? Does your team need to catch up with all the streaming jargon? In this blog post, we’ll walk through how you can use Materialize to replicate data from Neon in real-time without needing to rethink much.

Why Materialize?

Logical replication allows capturing INSERTs, UPDATEs, and DELETEs as these operations happen in your Neon database, making this data available to an external system for processing — a pattern known as Change Data Capture (CDC). If your use case requires more than simple mirroring, Materialize can help you transform the replicated data into actionable results that are always fresh and consistent.

Materialize shines at operational use cases, where an analytical data warehouse would be too slow or too costly, and a stream processor would introduce too much complexity. It can handle arbitrarily complex transformations on changing data using — you’ve guessed it — just SQL. For CDC from a PostgreSQL database like Neon, it has the added benefit of requiring no additional infrastructure (e.g. Debezium), while guaranteeing transactional consistency.

Enough talk!

Let’s see what it takes to build a CDC pipeline to replicate data from Neon to Materialize in real time. If you want to follow along, sign up for a Materialize free trial!

Get started with Neon + Materialize

Configure logical replication in Neon

1. First, enable logical replication for your Neon project:

Select your project in the Neon Console.
On the Neon Dashboard, select Settings.
Select Beta.
Click Enable to enable logical replication.

🖐️ It’s important to note that enabling logical replication in Neon cannot be reverted, and triggers a restart of all active compute endpoints in your Neon project. Any active connections will be dropped and have to reconnect.

2. Double check that logical replication is enabled by running the following query:

SHOW wal_level;

 wal_level 
-----------
 logical

Once logical replication is enabled, the next step is to create a publication with the tables that you want to replicate to Materialize. It’s also recommended to create a dedicated user for replication.

4. Grant the replication user the required permissions on the tables you want to replicate:

GRANT CONNECT ON DATABASE neon_demo TO neon_repl;
GRANT USAGE ON SCHEMA public TO neon_repl;
GRANT SELECT ON playing_with_lr TO neon_repl;

For each table that you want to replicate to Materialize, set the replica identity to FULL:

ALTER TABLE playing_with_lr REPLICA IDENTITY FULL;

REPLICA IDENTITY FULL ensures that the replication stream includes the previous data of changed rows, in the case of UPDATE and DELETE operations, and enables Materialize to ingest Neon data with minimal in-memory state.

5. Create a publication with the tables you want to replicate:

CREATE PUBLICATION mz_demo FOR TABLE playing_with_lr;

The mz_demo publication will contain the set of changes generated for the specified table, and will be used to ingest the replication stream into Materialize next!

Create a source in Materialize

Logical replication in Neon follows PostgreSQL’s native replication protocol, so you can jump straight into action using Materialize’s direct PostgreSQL source.

1. Log in to Materialize. You’ll be dropped into the SQL shell, which you can use to run the commands in the next steps.

2. Use the connection details of your Neon endpoint to create a connection:

-- Your Neon password goes here
CREATE SECRET neonpass AS 'safe12345';
CREATE CONNECTION neon TO POSTGRES (
  -- Your Neon endpoint goes here
  HOST 'ep-dry-bread-********.eu-central-1.aws.neon.tech',
  PORT 5432,
  -- Your Neon replication user goes here
  USER 'neon_repl',
  PASSWORD SECRET neonpass,
  SSL MODE 'require',
  DATABASE 'neondb_cdc'
  );

2. Create a PostgreSQL source using the connection you just created, as well as the publication you created upstream:

CREATE SOURCE neon_demo
  IN CLUSTER default
  FROM POSTGRES CONNECTION neon (PUBLICATION 'mz_demo')
  FOR ALL TABLES;

Materialize will create a subsource for each table in the publication, which will first get hydrated with a snapshot of the existing data, and then carry on receiving new data as it arrives:

SHOW SUBSOURCES ON neon_demo;

        name        |   type
--------------------+-----------
 neon_demo_progress | progress
 playing_with_lr    | subsource

SELECT * FROM playing_with_lr;

     col
-------------
 hello
 logical
 replication

Transform data in Materialize

To get a quick taste of what Materialize can do, let’s keep track of the row count in playing_with_lr, and see how adding and deleting data in Neon immediately affects the results in Materialize.

1. In Materialize, create a view with a count aggregate function:

CREATE VIEW lr_cnt AS
SELECT COUNT(*) 
FROM playing_with_lr;

SELECT * FROM lr_cnt;
 
count
-------
     3

2. Switch back to Neon, and insert some new data into the playing_with_lr table. Then, query the lr_cnt view in Materialize one more time:

This isn’t a very complex task, but…it’s still somewhat magic to see the count instantly update, no? Hopefully, your use case will be strictly more complex than this, and Materialize will make you Ooh! and Aah! at how efficiently it can handle CDC pipelines (which can be…surprisingly costly).

What’s next?

If you’re as excited as we are about logical replication support in Neon, we encourage you to learn more about Materialize and how it can help you build better products, faster. Check out the Materialize customer stories to see operational use cases in the wild, and reach out if you’d like to get a hands-on walkthrough with our team.

What is Neon?