Learn how Neon's autoscaling works - it estimates Postgres' working set size and keeps it in memory. Engineering post here
Postgres

Stream data from Neon to external data sources via logical replication

Send data to Airbyte, Fivetran, Kafka, and others for real-time analytics and Change Data Capture

Post image

Outbound logical replication is available in beta for all Neon projects (outbound = from Neon to other sources). Inbound logical replication (from other sources to Neon) will be available soon. If you want to try it early, tell us here or join our Early Access list.

Logical replication allows you to replicate data from your Neon database to various external systems, enabling Change Data Capture (CDC) and real-time analytics. You can use logical replication to stream data to data warehouses, analytical databases, messaging platforms, event-streaming platforms, and other Postgres databases. 

In this blog post, we introduce you to the main concepts and use cases for logical replication. We provide how-to guides for starting to replicate data from Neon to platforms like Airbyte, Confluent, Materialize, Fivetran, and Decodable.

Postgres logical replication: crash course 

What is logical replication?

Logical replication allows you to mirror data between a Postgres database and another data source. When you enable logical replication in a Postgres database (a “publisher”), every insert, update, and delete operation is captured and streamed in real-time to another database or data source (the “subscriber”). This ensures that the data in both sources remains synchronized.

What logical replication is not

When starting with logical replication, it’s important to understand that it focuses exclusively on replicating data changes (inserts, updates, deletes), not DDL operations like schema changes or other configuration changes.

In practice, this means if you make changes to the schema in the publisher database (e.g., adding a new column, modifying a table structure, or creating a new index), you will need to manually apply these changes to the subscriber database. The same applies to configuration settings.

Benefits of logical replication 

Even with this limitation, logical replication is useful for many reasons:

  • Selective data sync. Logical replication allows you to choose specific tables or even individual rows for replication: you can replicate only the data you need, avoiding unnecessary duplication and saving resources. For example, an e-commerce platform can replicate only sales and inventory tables to a reporting database for analytics.
  • Real-time data replication. Any changes made in the source database are propagated to the target database almost immediately, ensuring your data is always current in both the publisher and subscriber. This is great for customer-facing applications that require real-time data updates, such as new orders or user activity.
  • Production migrations without downtime. Logical replication can be your best friend while migrating a large production database between different database vendors. Since you can’t shut things off, you can rely on logical replication to ensure that data is continuously synchronized between the old and new databases during the migration. Once everything is ready, you can switch connections.
  • Safer Postgres upgrades in large databases. Similarly, logical replication can also be useful when handling upgrades of large Postgres databases, ensuring a smooth transition without downtime.

How is logical replication different vs pg_dump/restore? 

Both logical replication and pg_dump/restore are used for “copying data” between databases, but they are significantly different. Pg_dump creates static snapshots of the database or selected tables, but it doesn’t keep the target database up-to-date with ongoing changes. All data is loaded into the source database at once, and depending on the dataset size, this might take time.

In contrast, logical replication continuously streams individual data changes from the publisher to the subscriber in real-time. When you enable logical replication, Postgres uses its own mechanisms to create an initial snapshot of the data; once this initial data synchronization is complete, logical replication then streams changes as they occur, keeping the subscriber always up-to-date with the latest changes from the publisher database.

When to use pg_dump/restore or logical replication?

This depends on your needs. Some scenarios require both data sources to be completely in sync, while in others, this might not be necessary or desirable.

How logical replication works in Postgres

📚 Check out our documentation on logical replication for a deeper dive. 

Logical replication in Neon works similarly to standard Postgres, using a publish-subscribe model where all changes are tracked via Postgres’ Write-Ahead Log (WAL). 

  • The publisher, a Neon database, “publishes” data changes to the subscriber, the data’s destination.
  • When a subscriber first connects, it takes a snapshot of the data from the publisher and copies it. After this initial transfer, any subsequent changes in the publisher database are sent to the subscriber in real-time.
  • WAL records every change made to the data in Postgres databases (WAL plays a core role in the Neon architecture). During logical replication, WAL records all data changes in the publisher, which are then decoded and sent to the subscriber.
  • Decoder plugins convert WAL entries into a format that can be understood and processed by the subscriber. These plugins transform WAL entries into logical change records, creating a logical replication stream that the subscriber processes and applies to its dataset.

To ensure data consistency and prevent potential data loss, Postgres uses replication slots. When setting up logical replication, a replication slot is typically created on the publisher database. This slot ensures that the WAL records, containing all changes made to the database, are not deleted until safely replicated to all subscribers. This guarantees that no changes are lost, even if there are network issues or the subscriber is temporarily unavailable.

Setting up logical replication in Neon

By now, Neon supports outbound logical replication in beta (Neon as the publisher). We’re working on supporting inbound logical replication soon (Neon as a subscriber). If you are interested in trying it out, tell us here or join our Early Access list. 

Enabling logical replication in Neon is done through the Console. Take into account that once enabled, all computes in your project are restarted, meaning active connections will be dropped and need to reconnect. 

To enable logical replication in a Neon project, navigate to your project settings, select Logical Replication, and click Enable:

You can verify that logical replication is enabled by running:

SHOW wal_level;

Replication roles

Once you enable logical replication, check that the right  roles have the  1REPLICATION1 privilege. The default Postgres role and roles created via the Neon Console, CLI, or API have this privilege; roles created via SQL do not, and this privilege cannot be granted manually. 

To verify that a role has the REPLICATION privilege, run:

​​SELECT rolname, rolreplication FROM pg_roles WHERE rolname = '<role_name>';

Publications

Publications define which tables and changes are replicated from the publisher to the subscribers. To create a publication, use the CREATE PUBLICATION command. For example, to replicate changes in the users table, you would use:

CREATE PUBLICATION my_publication FOR TABLE users;

You could also create a publication that only publishes specific operations, such as inserts and updates:

CREATE PUBLICATION my_publication FOR TABLE users WITH (publish = 'insert,update');

To add or remove tables from a publication, you can use the ALTER PUBLICATION command:

ALTER PUBLICATION my_publication ADD TABLE sales;
ALTER PUBLICATION my_publication DROP TABLE sales;

If you need to remove a publication entirely, you would use the DROP PUBLICATION command:

DROP PUBLICATION IF EXISTS my_publication;

Understanding replication slots

As we mentioned earlier, replication slots are crucial for ensuring data consistency and preventing data loss. You can create a replication slot manually using:

SELECT pg_create_logical_replication_slot('my_replication_slot', 'pgoutput');

To remove a replication slot, you can use:

SELECT pg_drop_replication_slot('my_replication_slot');

In Neon, inactive replication slots are automatically removed after 75 minutes if other active slots exist. This helps prevent storage bloat. If your setup requires a longer inactivity period, reach out to us so we can modify this for you.  

📚 Check out our docs for detailed instructions on setting up logical replication

Get started with logical replication guides

To help you get started with logical replication in Neon, we have a series of guides with step-by-step instructions to stream data from your Neon database to different external platforms and services, including:

Airbyte 

Airbyte is an open-source data integration platform that moves data from a source to a destination system. Check out this guide to replicate data from a Neon database to Airbyte.

Bemi

Bemi is an open-source solution that plugs into Postgres and ORMs such as Prisma, TypeORM, SQLAlchemy, and Ruby on Rails to track database changes automatically. Check out this guide to replicate data from a Neon database to Bemi.

DoubleCloud 

DoubleCloud is a managed data platform that helps engineering teams build data infrastructure with open-source technologies. Check out this guide to replicate data from a Neon database to ClickHouse via DoubleCloud.

 Fivetran

Fivetran is a data pipeline platform that helps you centralize data from disparate sources.  Check out this guide to replicate data from a Neon database to Fivetran.

Kafka via Confluent 

Confluent is a fully managed, cloud-native real-time data streaming service built on Apache Kafka. Check out this guide to replicate data from a Neon database to Confluent.

Materialize 

Materialize is a data warehouse for operational workloads, purpose-built for low-latency applications. Check out this guide to replicate data from a Neon database to Materialize.

Decodable 

Decodable is a fully managed platform for ETL, ELT, and stream processing, powered by Apache Flink® and Debezium. Check out this guide to replicate data from a Neon database to Decodable.

Sequin 

Sequin is a serverless messaging stream for API integrations. Check out this guide to replicate data from a Neon database to Sequin.

Try it: create a Neon account for free 

You can get started with logical replication and Neon right away. Create an account in our Free Plan, no credit card required. And keep an eye on our docs for more logical replication guides to come!