Major pricing update: Every paid plan gets 10x-20x more projects - See details
Community

Postgres Meets Analytics: CDC From Neon to ClickHouse Via PeerDB

Combining ClickHouse and Neon for real-time analytics on transactional data

Post image

If you’re building a data-driven application that handles large amounts of data, you may need to balance two different types of databases: a purpose-built operational DB for your transactions, and an analytical database for large-scale data analysis. A purpose-built analytical database can handle the fast analytical processing you may need (e.g., for real-time dashboards) without burdening the transactional database, which remains optimized for quick and precise row-level operations.

Combining Neon and ClickHouse: operational efficiency with large-scale analytics 

A nice pair to know are Neon and ClickHouse. Neon is a serverless Postgres service that excels at handling transactional workloads with cost-efficiency thanks to features like autoscaling, scale-to-zero, and branching. On the other hand, ClickHouse is a high-performance columnar database optimized for real-time analytics and handling large volumes of data with no compromise on speed and efficiency. Both Neon and ClickHouse evolve from the same ethos of open source, which makes this combination even more compelling.

By combining Neon and ClickHouse, you get the best of both worlds—Neon for building low-latency transactional web applications (OLTP) and ClickHouse for blazing-fast analytics (OLAP). Here are a couple important use cases that this combination enables:

  • Real-time customer facing analytics: You can continuously sync data from Neon to ClickHouse using Change Data Capture (CDC) and use ClickHouse to run real-time customer-facing analytics on your transactional data.
  • Application source of truth+ Data Warehousing: By sending CDC to a data warehouse like ClickHouse, Neon can serve as the system of record for your web application. This keeps your application code simpler and Postgres lightweight, ensuring super low latency for your transactional applications. Simpler code and you still benefit from the analytical power of ClickHouse.

PeerDB brings Neon and ClickHouse closer together

Now that we understand Neon and ClickHouse can be a great duo for developers, let’s see how we can bring them together. The answer is PeerDB – PeerDB is a leading provider of Change Data Capture (CDC) solutions for Postgres. Their technology enables real-time data replication and synchronization, helping businesses keep their data up-to-date and accessible for analytics and decision-making.

A few months ago, ClickHouse acquired PeerDB to make it seamless for customers to move data from their Postgres databases to ClickHouse and power real-time analytics and data warehousing use cases.

PeerDB natively supports Neon Postgres as a source. This enables customers to replicate their transactional data from Neon to ClickHouse for real-time analytics. 

Under the hood, PeerDB relies on Postgres logical replication for CDC. It implements multiple Postgres native optimizations to provide a 10x faster and more reliable replication experience to ClickHouse. The good news is that Neon supports Postgres logical replication out-of-the-box. Hence all the PeerDB optimizations can be extended to Neon out-of-the-box.

PeerDB is available in both open source and cloud offerings, and the ClickHouse team is actively working on integrating PeerDB into ClickPipes, the native ingestion service for ClickHouse. PeerDB will power the Postgres CDC connector in ClickPipes. Neon is planned to be supported as a source in ClickPipes.

Streaming data from Neon to ClickHouse using PeerDB 

Now, let’s quickly walk through how easy it is to set up replication from Neon to ClickHouse using PeerDB. Visit the PeerDB docs for a complete step-by-step walkthrough.

Configuring Neon (origin)

Detailed instructions here

​​Create a user and publication in Neon

In the Neon SQL console, create a user for PeerDB with the necessary permissions and set up a publication to start tracking data changes:

  CREATE USER peerdb_user PASSWORD 'peerdb_password';
  GRANT USAGE ON SCHEMA "public" TO peerdb_user;
  GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO peerdb_user;
  ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO peerdb_user;

-- Give replication permission to the USER
  ALTER USER peerdb_user REPLICATION;

-- Create a publication. We will use this when creating the mirror
  CREATE PUBLICATION peerdb_publication FOR ALL TABLES;

Enable logical replication in Neon 

Go to your Neon project settings in the console, and enable logical replication. Also, verify the settings for wal_level, max_wal_senders, and max_replication_slots:

SHOW wal_level; -- should be logical
SHOW max_wal_senders; -- should be 10
SHOW max_replication_slots; -- should be 10

Get connection details

Copy the connection string details from your Neon dashboard:

Creating a Neon peer 

In PeerDB’s UI, create a Neon Postgres peer using the connection details. Validate and create the peer:

Configuring ClickHouse (destination)

Detailed instructions here

Create a destination database

Via the ClickHouse SQL Console, create a database for syncing:

CREATE DATABASE peerdb;

Create a ClickHouse user with the necessary permissions

Set up a peerdb_user with permissions for INSERT, SELECT, DROP, and ALTER:

CREATE USER peerdb_user IDENTIFIED BY '<password>';
GRANT INSERT, SELECT, DROP, CREATE TABLE, ALTER ADD COLUMN ON peerdb.* TO peerdb_user;
GRANT CREATE TEMPORARY TABLE, s3 ON *.* TO peerdb_user;

Create a ClickHouse peer 

In PeerDB UI, configure the peer with your connection details, and click “Create” once validating the connection:

Setting up CDC from Neon to ClickHouse 

Detailed instructions here 

Once you’ve configured Neon and ClickHouse in PeerDB, the data synchronization follows these steps:

1- Go to the Mirrors tab in the PeerDB UI:

2- Create a new Mirror and select CDC as the mirror type for real-time replication

3- Name the Mirror (e.g., neon_to_clickhouse)

4- Select the source and destination peers you configured for Neon and ClickHouse

5- Configure sync settings according to the instructions here (e.g., initial snapshot, publication, and replication slot names) 

6- Select the Postgres tables you want to replicate

7- Validate and Create the mirror. This will start the continuous sync between Neon and ClickHouse, ensuring real-time data updates.

Getting started 

To get started:

If you have any questions, we’ll be happy to help in the ClickHouse Slack and the Neon Discord.