The pg_mooncake extension

The pg_mooncake extension enables fast analytic workloads in Postgres by adding native columnstore tables and vectorized execution (DuckDB).

Columnstore tables improve analytical queries by storing data vertically, enabling compression and efficient column-specific retrieval with vectorized execution.

pg_mooncake columnstore tables are designed so that only metadata is stored in Postgres, while data is stored in an object store as Parquet files with Icebergor Delta Lake metadata.

Queries on pg_mooncake columnstore tables are executed by DuckDB.

The extension is maintained by Mooncake Labs.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

You can create and use pg_mooncake columnstore tables like regular Postgres heap tables to run:

Transactional INSERT, SELECT, UPDATE, DELETE, and COPY operations
Joins with regular Postgres tables

In addition, you can:

Load Parquet, CSV, and JSON files into columnstore tables
Load Hugging Face datasets
Run DuckDB specific aggregate functions like approx_count_distinct
Read existing Iceberg and Delta Lake tables
Write Delta Lake tables from Postgres tables

note

pg_mooncake is an open-source extension for Postgres that can be installed on any Neon Project using the instructions below.

Use cases for pg_mooncake

pg_mooncake supports several use cases, including:

Analytics on Postgres data
Time Series & Log Analytics
Exporting Postgres tables to your Lake or Lakehouse
Querying and updating existing Lakehouse tables and Parquet files directly in Postgres

This guide provides a quickstart to the pg_mooncake extension.

Enable the extension
note
The pg_mooncake extension is currently in Beta and classified as experimental in Neon. A separate, dedicated Neon project is recommended when using an extension that is still in Beta. For additional guidance, see Experimental extensions.
While the pg_mooncake extension is in Beta, you need to explicitly allow it to be used on Neon before you can install it. To do so, connect to your Neon database via an SQL client like psql or the Neon SQL Editor and run the SET command shown below.
```
SET neon.allow_unstable_extensions='true';
```
Install the extension:
```
CREATE EXTENSION pg_mooncake;
```
Set up your object store
Run the commands outlined in the following steps on your Neon database to setup your object store.
If you don't have an object storage bucket, you can get a free S3 express bucket here. When using the free s3 bucket, the SELECT and SET statements defined below are generated for you, which you can copy and run.
Add your object storage credentials. In this case, S3:
```
SELECT mooncake.create_secret('<name>', 'S3', '<key_id>',
        '<secret>', '{"REGION": "<s3-region>"}');
```
Set your default bucket:
```
SET mooncake.default_bucket = 's3://<bucket>';
```
R2 and GCP buckets also supported
The pg_mooncake extension also supports R2 and GCP buckets. For set up instructions, refer to pg_mooncake's cloud storage docs.
In the future, you will not have to bring your own bucket to use pg_mooncake with Neon.

Create a columnstore table with `USING columnstore`

Run the following SQL statement on your Neon database to create a columnstore table:

CREATE TABLE reddit_comments(
    author TEXT,
    body TEXT,
    controversiality BIGINT,
    created_utc BIGINT,
    link_id TEXT,
    score BIGINT,
    subreddit TEXT,
    subreddit_id TEXT,
    id TEXT
) using columnstore;

Load data

You can find a list of data sources here. This dataset has 13 million rows and may take a few minutes to load.

INSERT INTO reddit_comments
(SELECT author, body, controversiality, created_utc, link_id, score, subreddit, subreddit_id, id
FROM mooncake.read_parquet('hf://datasets/fddemarco/pushshift-reddit-comments/data/RC_2012-01.parquet')
AS (author TEXT, body TEXT, controversiality BIGINT, created_utc BIGINT, link_id TEXT, score BIGINT, subreddit TEXT, subreddit_id TEXT, id TEXT));

Query the table

Queries on columnstore tables are executed by DuckDB. For example, this aggregate query runs in ~200 milliseconds on 13 million rows:

-- Top commenters (excluding [deleted] users)
SELECT
    author,
    COUNT(*) as comment_count,
    AVG(score) as avg_score,
    SUM(score) as total_score
FROM reddit_comments
WHERE author != '[deleted]'
GROUP BY author
ORDER BY comment_count DESC
LIMIT 10;

The pg_mooncake extension

Try it on Neon!

note

Use cases for pg_mooncake

Enable the extension

note

Set up your object store

R2 and GCP buckets also supported

Create a columnstore table with `USING columnstore`

Load data

Query the table

References

On this page

The pg_mooncake extension

Try it on Neon!

note

Use cases for pg_mooncake

Enable the extension

note

Set up your object store

R2 and GCP buckets also supported

Create a columnstore table with USING columnstore

Load data

Query the table

References

On this page

Create a columnstore table with `USING columnstore`