Build multi-tenant RAG with Neon's database-per-user model — no nosy neighbors, max isolation, minimal costs
Docs/Features/Autoscaling/How the algorithm works

Understanding Neon’s autoscaling algorithm

How Neon’s algorithm scales resources to match your workload

What you will learn:

  • Key metrics that drive autoscaling decisions

  • How often the algorithm checks these metrics

The key concept behind autoscaling is that compute resizing happens automatically — once you set up your minimum and maximum compute sizes, there’s no action required on your part other than monitoring your usage metrics to see if adjustments are needed.

That said, it can be helpful to understand exactly when and under what circumstances the algorithm optimizes your database on two key fronts — performance and efficiency. In a nutshell, the algorithm automatically scales up your compute to ensure optimal performance and scales down to maximize efficiency.

autoscaling algorithm

How the algorithm works

Neon's autoscaling algorithm uses two components, the vm-monitor and the autoscaler-agent, to continuously monitor three key metrics: your average CPU load, your memory usage, and the activity of your Local File Cache (LFC). These metrics determine how your compute resources — the virtual machine that powers your database — should be scaled to maintain performance and efficiency.

The Formula

In essence, the algorithm is built on goals. We set a goal (an ideal compute size) for each of the three key metrics:

  • cpuGoalCU — Keep the 1-minute average CPU load at or below 90% of the available CPU capacity.
  • memGoalCU — Keep memory usage at or below 75% of the total allocated RAM.
  • lfcGoalCU — Fit your frequently accessed working set within 75% of the compute's RAM allocated to the LFC.

The formula can be expressed as:

goalCU := max(cpuGoalCU, memGoalCU, lfcGoalCU)

The algorithm selects the highest value from these goals as the overall goalCU, ensuring your database has enough resources to handle the most demanding metric — while staying within the minimum and maximum limits you’ve set.

The Metrics

Let's go into a bit more detail about each metric.

CPU load average

The CPU load average is a measure of how much work your CPU is handling. Every 5 seconds, the autoscaler-agent checks the 1-minute load average from the virtual machine (VM) running your database. This load average reflects the average number of processes waiting to be executed by the vCPU over the previous minute.

The goal is to keep the CPU load at or below 90% of the available vCPU capacity. If the load exceeds this threshold, the algorithm increases the compute allocated to your database to handle the additional demand.

In simpler terms, if your database is working too hard, the algorithm adds more CPU power to keep things running smoothly.

Memory Usage

Memory usage refers to the amount of RAM your database and its related processes are using. Every 5 seconds, the autoscaler-agent checks for the latest memory metrics from inside the VM, and every 100ms the vm-monitor checks memory usage from Postgres.

The algorithm aims to keep overall memory usage at or below 75% of the total allocated memory. If your database starts using more memory than this threshold, the algorithm increases compute size to allocate more memory, making sure your database has enough RAM to perform well without over-provisioning.

Local File Cache (LFC) working set size

An important part of the scaling algorithm is estimating your current working set size — a subset of your most frequently accessed data — and scaling your compute to ensure it fits within the LFC.

Every 20 seconds, the autoscaler-agent checks the working set size across a variety of time windows, ranging from 1 to 60 minutes. The goal is to fit your working set within 75% of the compute’s RAM allocated to the LFC. If your working set exceeds this threshold, the algorithm increases compute size to expand the LFC, keeping frequently accessed data in memory for faster access. To learn more about how we do this, see Dynamically estimating and scaling Postgres’ working set size.

note

If your dataset is small enough, you can improve performance by keeping the entire dataset in memory. Check your database size on the Monitoring dashboard and adjust your minimum compute size accordingly. For example, a 6.4 GiB database can comfortably fit within a compute size of 2 vCPU with 8 GB of RAM (where the LFC can use up to 80% of the available RAM).

How often the metrics are polled

To give you a sense of the algorithm's responsiveness, here's a summary of how often the metrics are polled:

  • Every 5 seconds → the autoscaler-agent fetches load metrics from the VM, including CPU usage and overall memory usage.
  • Every 20 seconds → the autoscaler-agent checks the Local File Cache (LFC) metrics, including the working set size across various time windows: 1 minute, 2 minutes, up to 60 minutes.
  • Every 100 milliseconds → the vm-monitor checks memory usage specifically within Postgres.

This frequent polling allows the algorithm to respond swiftly to changes in workload, ensuring that your compute resources are always appropriately scaled to meet current demands.

Last updated on

Was this page helpful?