Partial Twin
Create a partial Twin of your production database
This workflow will create a partial Neon Twin using pg_dump
, pg_restore
and psql
.
note
To use this workflow, you'll need the Postgres connection string for your Neon database. Follow our Getting Started Guide to learn how.
Create the workflow
To create the Twin workflow in any GitHub-hosted repository:
- Create a new directory named
.github
at the root of your project. - Inside this directory, create another directory named
workflows
. - Within the
workflows
directory, create a new file namedcreate-neon-twin.yml
.
Add the following code to create-neon-twin.yml
.
GitHub Action explained
Below is an explanation of each part of the GitHub Action.
on
name
: The name of the Action as it appears in the GitHub UI.cron
: The POSIX cron syntax that defines when the Action will run.workflow_dispatch
: Enables manual triggering through the GitHub UI.
env
PROD_DATABASE_URL
: The PostgreSQL connection string for your production database.DEV_DATABASE_URL
: The PostgreSQL connection string for your Neon database.PG_VERSION
: The version of PostgreSQL to install in the Action environment.
steps
Install PostgreSQL
: Installs the specified version of PostgreSQL into the Action environment from the Apt repository.Set PostgreSQL binary path
: Creates$POSTGRES
variable for use in subsequent steps.Dump schema
: Exports the database schema (table structures, indexes, constraints) from the production database into a backup file.Dump data
: Extracts a subset of data (50 most recent users) from the production database into a CSV file.Drop tables and schema
: Completely removes the existing schema in the development database and recreates it to ensure a clean state.Restore schema
: Imports the previously dumped schema into the development database, re-creating table structures and constraints.Restore data
: Loads the extracted data subset from the CSV file into the corresponding table in the development database.
pg_dump flags
The table below provides an explanation of each flag used by pg_dump
.
Flag | Meaning |
---|---|
-FC | Dumps the database in a custom format. |
--schema-only | Dumps only the schema (table structures, indexes, constraints) without any data. |
-f | Specifies the output file where the schema dump will be stored. |
psql flags
The table below provides an explanation of each flag used by pg_dump
.
Flag | Meaning |
---|---|
-c | Executes a single command and then exits. |
-d | Specifies the database name to connect to. |
pg_restore flags
The table below provides an explanation of each flag used by pg_restore
.
Flag | Meaning |
---|---|
--clean | Drops existing database objects before recreating them, ensuring a clean restore. |
--no-owner | Ignores ownership information in the dump file, so restored objects are owned by the user running the restore. |
--no-acl | Excludes access control (GRANT/REVOKE) statements from the restore, preventing permission changes. |
--if-exists | Ensures that DROP commands (used with --clean ) only execute if the object exists, preventing errors. |
--schema-only | Restores only the schema (table structures, indexes, constraints) without inserting any data. |
-d | Specifies the target database to restore into. |
Working with multiple tables
The action above works well for dumping data from a single table. However, when working with multiple tables that have foreign key relationships, it's important to ensure that those relationships remain intact.
For example, if you're dumping a subset of data from a transactions table that references a product_id
from the products table and a user_id
from the users table, you must also query the corresponding products and users data.
This ensures that all referenced product_id
and user_id
values exist in the restored dataset, maintaining valid foreign key constraints.
To account for this, you may need to adjust the Dump data and Restore data steps accordingly.
For example, here is an amended example for the Dump data step.
And here is an example for the amended Restore data step.
Setting repository secrets
Before running the Action, ensure that both PROD_DATABASE_URL
and DEV_DATABASE_URL
are added to your GitHub repository secrets.
In your repository, go to Settings > Secrets and variables > Actions to add them.
Testing the workflow
To manually trigger your workflow go to Actions > Create Neon Twin then click Run workflow. From the dropdown, click the Run workflow button.
Syncing with migration changes
The GitHub Action runs on a recurring schedule, but you may also want it to trigger when migration changes are applied and a Pull Request is merged. To enable this, update the Action with the following code:
Handling Pull Request Events
Add a pull_request
event and configure it to listen for merges into the main branch
.
Add Concurrency and Conditions
To prevent conflicts between scheduled runs and runs triggered by a Pull Request, set cancel-in-progress
to true
under concurrency
. Additionally, add an if
statement to ensure the job only executes when specific conditions are met.
Limitations
Be aware of usage limits: Each GitHub Action job can run for up to 6 hours. If a job exceeds this limit, it will be terminated and fail to complete. If your dump/restore process takes longer, consider using self-hosted runners.