Build multi-tenant RAG with Neon's database-per-user model — no nosy neighbors, max isolation, minimal costs
Docs/AI & Embeddings/The pgvector extension

The pgvector extension

Enable Postgres as a vector store with the pgvector extension

The pgvector extension enables you to store vector embeddings and perform vector similarity search in Postgres. It is particularly useful for applications involving natural language processing, such as those built on top of OpenAI's GPT models.

pgvector supports:

  • Exact and approximate nearest neighbor search
  • Single-precision, half-precision, binary, and sparse vectors
  • L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
  • Any language with a Postgres client
  • ACID compliance, point-in-time recovery, JOINs, and all other Postgres features

This topic describes how to enable the pgvector extension in Neon and how to create, store, and query vectors.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

Enable the pgvector extension

You can enable the pgvector extension by running the following CREATE EXTENSION statement in the Neon SQL Editor or from a client such as psql that is connected to Neon.

CREATE EXTENSION vector;

For information about using the Neon SQL Editor, see Query with Neon's SQL Editor. For information about using the psql client with Neon, see Connect with psql.

Create a table to store vectors

To create a table for storing vectors, you would use an SQL command similar to the following. Embeddings are stored in the VECTOR type column. You can adjust the number of dimensions as needed.

CREATE TABLE items (
  id BIGSERIAL PRIMARY KEY,
  embedding VECTOR(3)
);

note

The pgvector extension supports some specialized types other than VECTOR for storing embeddings. See HNSW vector types, and IVFFlat vector types.

This command generates a table named items with an embedding column capable of storing vectors with 3 dimensions. OpenAI's text-embedding-3-small model supports 1536 dimensions by default for each piece of text, which creates more accurate embeddings for natural language processing tasks. However, using larger embeddings generally costs more and consumes more compute, memory, and storage than using smaller embeddings. To learn more about embeddings and the cost-performance tradeoff, see Embeddings, in the OpenAI documentation.

Storing embeddings

After generating embeddings using a service like OpenAI’s Embeddings API, you can store them in your database. Using a Postgres client library in your preferred programming language, you can execute an INSERT statement similar to the following to store embeddings.

  • Insert two new rows into the items table with the provided embeddings.

    INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
  • Load vectors in bulk using the COPY command:

    COPY items (embedding) FROM STDIN WITH (FORMAT BINARY);

    tip

    For a Python script that loads embeddings in bulk, refer to this Bulk loading with COPY example provided in the pgvector GitHub repository.

  • Upsert vectors:

    INSERT INTO items (id, embedding) VALUES (1, '[1,2,3]'), (2, '[4,5,6]')
       ON CONFLICT (id) DO UPDATE SET embedding = EXCLUDED.embedding;
  • Update vectors:

    UPDATE items SET embedding = '[1,2,3]' WHERE id = 1;
  • Delete vectors:

    DELETE FROM items WHERE id = 1;

Querying vectors

To retrieve vectors and calculate similarity, use SELECT statements and the distance function operators supported by pgvector.

  • Get the nearest neighbor to a vector by L2 distance:

    SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
  • Get the nearest neighbor to a row by L2 distance:

    SELECT * FROM items WHERE id != 1 ORDER BY embedding <-> (SELECT embedding FROM items WHERE id = 1) LIMIT 5;
  • Get rows within a certain distance by L2 distance:

    SELECT * FROM items WHERE embedding <-> '[3,1,2]' < 5;

    note

    To use an index with a query, include ORDER BY and LIMIT clauses, as shown in the second query example above.

Distance function operators

  • <-> - L2 distance
  • <#> - (negative) inner product
  • <=> - cosine distance
  • <+> - L1 distance

note

The inner product operator (<#>) returns the negative inner product since Postgres only supports ASC order index scans on operators.

Distance queries

  • Get the distances:

    SELECT embedding <-> '[3,1,2]' AS distance FROM items;
  • For inner product, multiply by -1 (since <#> returns the negative inner product):

    SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
  • For cosine similarity, use 1 - cosine distance:

    SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;

Aggregate queries

  • To average vectors:

    SELECT AVG(embedding) FROM items;
  • To average groups of vectors:

    SELECT category_id, AVG(embedding) FROM items GROUP BY category_id;

Indexing vectors

By default, pgvector performs exact nearest neighbor search, providing perfect recall. Adding an index on the vector column can improve query performance with a minor cost in recall. Unlike typical indexes, you will see different results for queries after adding an approximate index.

Supported index types include:

HNSW

An HNSW index creates a multilayer graph. It has better query performance than an IVFFlat index (in terms of speed-recall tradeoff), but has slower build times and uses more memory. Also, an HNSW index can be created without any data in the table since there isn’t a training step like there is for an IVFFlat index.

HNSW vector types

HNSW indexes are supported with the following vector types:

  • vector - up to 2,000 dimensions
  • halfvec - up to 4,000 dimensions
  • bit - up to 64,000 dimensions
  • sparsevec - up to 1,000 non-zero elements

note

Notice how indexes are defined differently depending on the distance function being used. For example vector_l2_ops is specified for L2 distance, vector_ip_ops for inner product, and so on. Make sure you define your index according to the distance function you intend to use.

  • L2 distance:

    CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
  • Inner product:

    CREATE INDEX ON items USING hnsw (embedding vector_ip_ops);
  • Cosine distance:

    CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
  • L1 distance:

    CREATE INDEX ON items USING hnsw (embedding vector_l1_ops);
  • Hamming distance:

    CREATE INDEX ON items USING hnsw (embedding bit_hamming_ops);
  • Jaccard distance:

    CREATE INDEX ON items USING hnsw (embedding bit_jaccard_ops);

HNSW index build options

  • m - the max number of connections per layer (16 by default)
  • ef_construction - the size of the dynamic candidate list for constructing the graph (64 by default)

This example demonstrates how to set the parameters:

CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WITH (m = 16, ef_construction = 64);

A higher value of ef_construction provides better recall at the cost of index build time and insert speed.

HNSW index query options

You can specify the size of the candidate list for search. The size is 40 by default.

SET hnsw.ef_search = 100;

A higher value provides better recall at the cost of speed.

This query shows how to use SET LOCAL inside a transaction to set ef_search for a single query:

BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT ...
COMMIT;

HNSW index build time

To optimize index build time, consider configuring the maintenance_work_mem and max_parallel_maintenance_workers session variables before building an index:

note

Like other index types, it’s faster to create an index after loading your initial data.

  • maintenance_work_mem

    Indexes build significantly faster when the graph fits into Postgres maintenance_work_mem.

    A notice is shown when the graph no longer fits:

    NOTICE:  hnsw graph no longer fits into maintenance_work_mem after 100000 tuples
    DETAIL:  Building will take significantly more time.
    HINT:  Increase maintenance_work_mem to speed up builds.

    In Postgres, the maintenance_work_mem setting determines the maximum memory allocation for tasks such as CREATE INDEX. The default maintenance_work_mem value in Neon is set according to your Neon compute size:

    Compute Units (CU)vCPURAMmaintenance_work_mem
    0.250.251 GB64 MB
    0.500.502 GB64 MB
    114 GB67 MB
    228 GB134 MB
    3312 GB201 MB
    4416 GB268 MB
    5520 GB335 MB
    6624 GB402 MB
    7728 GB470 MB
    8832 GB537 MB

    To optimize pgvector index build time, you can increase the maintenance_work_mem setting for the current session with a command similar to the following:

    SET maintenance_work_mem='10 GB';

    The recommended setting is your working set size (the size of your tuples for vector index creation). However, your maintenance_work_mem setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, the maintenance_work_mem='10 GB' setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GiB is less than 50% of the RAM available for that compute size.

  • max_parallel_maintenance_workers

    You can also speed up index creation by increasing the number of parallel workers. The default is 2.

    The max_parallel_maintenance_workers sets the maximum number of parallel workers that can be started by a single utility command such as CREATE INDEX. By default, the max_parallel_maintenance_workers setting is 2. For efficient parallel index creation, you can increase this setting. Parallel workers are taken from the pool of processes established by max_worker_processes (10), limited by max_parallel_workers (8).

    You can increase the maintenance_work_mem setting for the current session with a command similar to the following:

    SET max_parallel_maintenance_workers = 7

    For example, if you have a 7 CU compute size, you could set max_parallel_maintenance_workers to 7, before index creation, to make use of all of the vCPUs available.

    For a large number of workers, you may also need to increase the Postgres max_parallel_workers, which is 8 by default.

Check indexing progress

You can check indexing progress with the following query:

SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;

The phases for HNSW are:

  1. initializing
  2. loading tuples

For related information, see CREATE INDEX Progress Reporting, in the PostgreSQL documentation.

IVFFlat

An IVFFlat index divides vectors into lists and searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance with respect to the speed-recall tradeoff.

Keys to achieving good recall include:

  • Creating the index after the table has some data
  • Choosing an appropriate number of lists. A good starting point is rows/1000 for up to 1M rows and sqrt(rows) for over 1M rows.
  • Specifying an appropriate number of probes when querying. A higher number is better for recall, and a lower is better for speed. A good starting point is sqrt(lists).

IVFFlat vector types

IVFFlat indexes are supported with the following vector types:

  • vector - up to 2,000 dimensions
  • halfvec - up to 4,000 dimensions (added in 0.7.0)
  • bit - up to 64,000 dimensions (added in 0.7.0)

The following examples show how to add an index for each distance function:

note

Notice how indexes are defined differently depending on the distance function being used. For example vector_l2_ops is specified for L2 distance, vector_cosine_ops for cosine distance, and so on.

The following examples show how to add an index for each distance function:

  • L2 distance

    CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

    note

    Use halfvec_l2_ops for halfvec (and similar with the other distance functions).

  • Inner product

    CREATE INDEX ON items USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
  • Cosine distance

    CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
  • Hamming distance

    CREATE INDEX ON items USING ivfflat (embedding bit_hamming_ops) WITH (lists = 100);

IVFFlat query options

You can specify the number of probes, which is 1 by default.

SET ivfflat.probes = 10;

A higher value provides better recall at the cost of speed. You can set the value to the number of lists for exact nearest neighbor search, at which point the planner won’t use the index.

You can also use SET LOCAL inside a transaction to set the number of probes for a single query:

BEGIN;
SET LOCAL ivfflat.probes = 10;
SELECT ...
COMMIT;

IVFFlat index build time

To optimize index build time, consider configuring the maintenance_work_mem and max_parallel_maintenance_workers session variables before building an index:

note

Like other index types, it’s faster to create an index after loading your initial data.

note

Like other index types, it’s faster to create an index after loading your initial data.

  • maintenance_work_mem

    In Postgres, the maintenance_work_mem setting determines the maximum memory allocation for tasks such as CREATE INDEX. The default maintenance_work_mem value in Neon is set according to your Neon compute size:

    Compute Units (CU)vCPURAMmaintenance_work_mem
    0.250.251 GB64 MB
    0.500.502 GB64 MB
    114 GB67 MB
    228 GB134 MB
    3312 GB201 MB
    4416 GB268 MB
    5520 GB335 MB
    6624 GB402 MB
    7728 GB470 MB
    8832 GB537 MB
    9936 GB604 MB
    101040 GB671 MB

    To optimize pgvector index build time, you can increase the maintenance_work_mem setting for the current session with a command similar to the following:

    SET maintenance_work_mem='10 GB';

    The recommended setting is your working set size (the size of your tuples for vector index creation). However, your maintenance_work_mem setting should not exceed 50 to 60 percent of your compute's available RAM (see the table above). For example, the maintenance_work_mem='10 GB' setting shown above has been successfully tested on a 7 CU compute, which has 28 GB of RAM, as 10 GiB is less than 50% of the RAM available for that compute size.

  • max_parallel_maintenance_workers

    You can also speed up index creation by increasing the number of parallel workers. The default is 2.

    The max_parallel_maintenance_workers sets the maximum number of parallel workers that can be started by a single utility command such as CREATE INDEX. By default, the max_parallel_maintenance_workers setting is 2. For efficient parallel index creation, you can increase this setting. Parallel workers are taken from the pool of processes established by max_worker_processes (10), limited by max_parallel_workers (8).

    You can increase the maintenance_work_mem setting for the current session with a command similar to the following:

    SET max_parallel_maintenance_workers = 7

    For example, if you have a 7 CU compute size, you could set max_parallel_maintenance_workers to 7, before index creation, to make use of all of the vCPUs available.

    For a large number of workers, you may also need to increase the Postgres max_parallel_workers, which is 8 by default.

Check indexing progress

You can check indexing progress with the following query:

SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index;

The phases for HNSW are:

  1. initializing
  2. loading tuples

For related information, see CREATE INDEX Progress Reporting, in the PostgreSQL documentation.

Filtering

There are a few ways to index nearest neighbor queries with a WHERE clause:

SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

Create an index on one or more of the WHERE columns for exact search"

CREATE INDEX ON items (category_id);

Create a partial index on the vector column for approximate search:

CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);

Use partitioning for approximate search on many different values of the WHERE columns:

CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);

Half-precision vectors

Half-precision vectors enable the storage of vector embeddings using 16-bit floating-point numbers, or half-precision, which reduces both storage size and memory usage by nearly half compared 32-bit floats. This efficiency comes with minimal loss in precision, making half-precision vectors beneficial for applications dealing with large datasets or facing memory constraints.

When integrating OpenAI's embeddings, you can take advantage of half-precision vectors by storing embeddings in a compressed format. For instance, OpenAI’s high-dimensional embeddings can be effectively stored with half-precision vectors, achieving high levels of accuracy, such as a 98% rate. This approach optimizes memory usage while maintaining performance.

You can use the halfvec type to store half-precision vectors, as shown here:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding halfvec(3));

Binary vectors

Binary vector embeddings are a form of vector representation where each component is encoded as a binary digit, typically 0 or 1. For example, the word "cat" might be represented as [0, 1, 0, 1, 1, 0, 0, 1, ...], with each position in the vector being binary.

These embeddings are advantageous for their efficiency in both storage and computation. Because they use only one bit per dimension, binary embeddings require less memory compared to traditional embeddings that use floating-point numbers. This makes them useful when there is limited memory or when dealing with large datasets. Additionally, operations with binary values are generally quicker than those involving real numbers, leading to faster computations.

However, the trade-off with binary vector embeddings is a potential loss in accuracy. Unlike denser embeddings, which have real-valued entries and can represent subtleties in the data, binary embeddings simplify the representation. This can result in a loss of information and may not fully capture the intricacies of the data they represent.

Use the bit type to store binary vector embeddings:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding bit(3));
INSERT INTO items (embedding) VALUES ('000'), ('111');

Get the nearest neighbors by Hamming distance (added in 0.7.0)

SELECT * FROM items ORDER BY embedding <~> '101' LIMIT 5;

Or (before 0.7.0)

SELECT * FROM items ORDER BY bit_count(embedding # '101') LIMIT 5;

Jaccard distance (<%>) is also supported with binary vector embeddings.

Binary quantization

Binary quantization is a process that transforms dense or sparse embeddings into binary representations by thresholding vector dimensions to either 0 or 1.

Use expression indexing for binary quantization:

CREATE INDEX ON items USING hnsw ((binary_quantize(embedding)::bit(3)) bit_hamming_ops);

Get the nearest neighbors by Hamming distance:

SELECT * FROM items ORDER BY binary_quantize(embedding)::bit(3) <~> binary_quantize('[1,-2,3]') LIMIT 5;

Re-rank by the original vectors for better recall:

SELECT * FROM (
    SELECT * FROM items ORDER BY binary_quantize(embedding)::bit(3) <~> binary_quantize('[1,-2,3]') LIMIT 20
) ORDER BY embedding <=> '[1,-2,3]' LIMIT 5;

Sparse vectors

Sparse vectors have a large number of dimensions, where only a small proportion are non-zero.

Use the sparsevec type to store sparse vectors:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding sparsevec(5));

Insert vectors:

INSERT INTO items (embedding) VALUES ('{1:1,3:2,5:3}/5'), ('{1:4,3:5,5:6}/5');

The format is {index1:value1,index2:value2}/dimensions and indices start at 1 like SQL arrays.

Get the nearest neighbors by L2 distance:

SELECT * FROM items ORDER BY embedding <-> '{1:3,3:1,5:2}/5' LIMIT 5;

Differences in behaviour between pgvector 0.5.1 and 0.7.0

Differences in behavior in the following corner cases were found during our testing of pgvector 0.7.0:

Distance between a valid and NULL vector

The distance between a valid and NULL vector (NULL::vector) with pgvector 0.7.0 differs from pgvector 0.5.1 when using an HNSW or IVFFLAT index, as shown in the following examples:

HNSW

For the following script, comparing the NULL::vector to non-null vectors the resulting output changes:

SET enable_seqscan = off;

CREATE TABLE t (val vector(3));
INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL);
CREATE INDEX ON t USING hnsw (val vector_l2_ops);

INSERT INTO t (val) VALUES ('[1,2,4]');

SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector);

pgvector 0.7.0 output:

val
---------
 [1,1,1]
 [1,2,4]
 [1,2,3]
 [0,0,0]

pgvector 0.5.1 output:

val
---------
 [0,0,0]
 [1,1,1]
 [1,2,3]
 [1,2,4]

IVFFLAT

For the following script, comparing the NULL::vector to non-null vectors the resulting output changes:

SET enable_seqscan = off;

CREATE TABLE t (val vector(3));
INSERT INTO t (val) VALUES ('[0,0,0]'), ('[1,2,3]'), ('[1,1,1]'), (NULL);
CREATE INDEX ON t USING ivfflat (val vector_l2_ops) WITH (lists = 1);

INSERT INTO t (val) VALUES ('[1,2,4]');

SELECT * FROM t ORDER BY val <-> (SELECT NULL::vector);

pgvector 0.7.0 output:

val
---------
 [0,0,0]
 [1,2,3]
 [1,1,1]
 [1,2,4]

pgvector 0.5.1 output:

val
---------
[0,0,0]
[1,1,1]
[1,2,3]
[1,2,4]

Error messages improvement for invalid literals

If you use an invalid literal value for the vector data type, you will now see the following error message:

SELECT '[4e38,1]'::vector;
ERROR:  "4e38" is out of range for type vector
LINE 1: SELECT '[4e38,1]'::vector;

Resources

pgvector source code: https://github.com/pgvector/pgvector

Need help?

Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.

Last updated on

Was this page helpful?