RDS can create serious bottlenecks in engineering productivity — is this you? See how Neon can help

Postgres lag() window function

Use lag() to access values from previous rows in a result set

The lag() function in Postgres is a window function that allows you to access values from previous rows in a result set without the need for a self-join. It's useful for comparing values between the current row and a previous row, for example, when calculating running differences, plotting trends, or doing time series analysis.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

Function signature

The lag() function has the following forms:

lag(value any [, offset integer [, default any ]]) over (...)
  • value: The value to return from the previous row. This can be a column, expression, or subquery.
  • offset (optional): The number of rows back from the current row to retrieve the value from. If omitted, it defaults to 1. Must be a non-negative integer.
  • default (optional): The value to return when the offset goes beyond the scope of the window. If omitted, it defaults to null.
  • over (...): The OVER clause defines the window frame for the function. It can be an empty OVER (), or it can include a PARTITION BY and/or ORDER BY clause.

Example usage

Consider a table sales that contains daily sales data for a company. We can use lag() to compare each day's sales to the previous day's sales.

WITH sales AS (
  SELECT date '2023-01-01' AS sale_date, 1000 AS amount
  UNION ALL
  SELECT date '2023-01-02' AS sale_date, 1500 AS amount
  UNION ALL
  SELECT date '2023-01-03' AS sale_date, 1200 AS amount
  UNION ALL
  SELECT date '2023-01-04' AS sale_date, 1800 AS amount
)
SELECT
  sale_date,
  amount,
  lag(amount) OVER (ORDER BY sale_date) AS prev_amount,
  amount - lag(amount) OVER (ORDER BY sale_date) AS diff
FROM sales;

This query calculates the previous day's sales amount (prev_amount) and the difference between the current day's sales and the previous day's sales (diff). The OVER clause specifies that the window frame should be ordered by sale_date.

sale_date  | amount | prev_amount |  diff
------------+--------+-------------+-------
 2023-01-01 |   1000 |             |
 2023-01-02 |   1500 |        1000 |   500
 2023-01-03 |   1200 |        1500 |  -300
 2023-01-04 |   1800 |        1200 |   600
(4 rows)

You can also use lag() to access values from rows further back by specifying an offset. For example, to compare each day's sales to the sales from the same day of the previous week:

WITH sales AS (
  SELECT
    sale_date,
    floor(random() * 1000 + 1)::int AS amount
  FROM generate_series(date '2023-01-01', date '2023-01-31', interval '1 day') AS sale_date
)
SELECT
  sale_date,
  amount,
  lag(amount, 7) OVER (ORDER BY sale_date) AS prev_week_amount,
  amount - lag(amount, 7) OVER (ORDER BY sale_date) AS diff
FROM sales
ORDER BY sale_date DESC
LIMIT 5;

This query generates random sales data for each day in January 2023 and compares each day's sales to the sales from the same day of the previous week. The lag() function with an offset of 7 retrieves the sales amount from 7 days ago.

sale_date        | amount | prev_week_amount | diff
------------------------+--------+------------------+------
 2023-01-31 00:00:00+00 |    245 |               64 |  181
 2023-01-30 00:00:00+00 |    736 |              789 |  -53
 2023-01-29 00:00:00+00 |    208 |              763 | -555
 2023-01-28 00:00:00+00 |    710 |              899 | -189
 2023-01-27 00:00:00+00 |      1 |              229 | -228
 (5 rows)

Advanced examples

Using lag() with a default value

When the offset in lag() goes beyond the start of the window frame, it returns null by default. You can specify a default value to use instead, so the resulting column does not contain nulls.

WITH inventory AS (
  SELECT date '2023-01-01' AS snapshot_date, 100 AS quantity
  UNION ALL
  SELECT date '2023-01-02' AS snapshot_date, 80 AS quantity
  UNION ALL
  SELECT date '2023-01-03' AS snapshot_date, 120 AS quantity
  UNION ALL
  SELECT date '2023-01-04' AS snapshot_date, 90 AS quantity
)
SELECT
  snapshot_date,
  quantity,
  lag(quantity, 1, quantity) OVER (ORDER BY snapshot_date) AS prev_quantity,
  quantity - lag(quantity, 1, quantity) OVER (ORDER BY snapshot_date) AS change
FROM inventory;

This query calculates the change in inventory quantity compared to the previous day. For the first row, where there is no previous quantity, it uses the current quantity as the default value, resulting in a change of 0.

snapshot_date | quantity | prev_quantity | change
---------------+----------+---------------+--------
 2023-01-01    |      100 |           100 |      0
 2023-01-02    |       80 |           100 |    -20
 2023-01-03    |      120 |            80 |     40
 2023-01-04    |       90 |           120 |    -30
(4 rows)

Using lag() with partitioning

You can use lag() with partitioning to perform calculations within groups of rows.

WITH orders AS (
  SELECT 1 AS order_id, date '2023-01-01' AS order_date, 100 AS amount, 1 AS customer_id
  UNION ALL
  SELECT 2 AS order_id, date '2023-01-02' AS order_date, 150 AS amount, 1 AS customer_id
  UNION ALL
  SELECT 3 AS order_id, date '2023-01-03' AS order_date, 200 AS amount, 2 AS customer_id
  UNION ALL
  SELECT 4 AS order_id, date '2023-01-04' AS order_date, 120 AS amount, 1 AS customer_id
  UNION ALL
  SELECT 5 AS order_id, date '2023-01-05' AS order_date, 180 AS amount, 2 AS customer_id
)
SELECT
  order_id,
  order_date,
  amount,
  customer_id,
  lag(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date,
  order_date - lag(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS days_since_last_order
FROM orders;

This query calculates the number of days since each customer's previous order. The OVER clause partitions the data by customer_id and orders it by order_date within each partition.

order_id | order_date | amount | customer_id | prev_order_date | days_since_last_order
----------+------------+--------+-------------+-----------------+-----------------------
        1 | 2023-01-01 |    100 |           1 |                 |
        2 | 2023-01-02 |    150 |           1 | 2023-01-01      |                     1
        4 | 2023-01-04 |    120 |           1 | 2023-01-02      |                     2
        3 | 2023-01-03 |    200 |           2 |                 |
        5 | 2023-01-05 |    180 |           2 | 2023-01-03      |                     2
(5 rows)

Additional considerations

Correctness

The lag() function relates each row in the result set to a previous row in the same window frame. If the window frame is not explicitly defined, the default frame is the entire result set. Make sure to specify the correct ORDER BY and PARTITION BY clauses to ensure the desired behavior.

Performance implications

Window functions like lag() perform calculations across a set of rows defined by the OVER clause. This can be computationally expensive for large datasets or complex window definitions.

To optimize performance, make sure to:

  • Include an ORDER BY clause in the OVER clause to avoid sorting the entire dataset.
  • Use partitioning (PARTITION BY) to divide the data into smaller chunks when possible.
  • Create appropriate indexes on the columns used in the OVER clause.

Alternative functions

  • lead - Access values from subsequent rows in a result set. Similar to lag() but looks ahead in the partition instead of behind.
  • first_value() - Get the first value within a window frame.
  • last_value() - Get the last value within a window frame.

Resources

Last updated on

Was this page helpful?