The pg_search extension by ParadeDB adds functions and operators to Postgres that use BM25 (Best Matching 25) indexes for efficient, high-relevance text searches. It supports standard SQL syntax and JSON query objects, offering features similar to those in Elasticsearch.

pg_search eliminates the need to integrate external search engines, simplifying your architecture and providing real-time search functionality that's tightly coupled with your transactional data.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

In this guide, you'll learn how to enable pg_search on Neon, understand the fundamentals of BM25 scoring and inverted indexes, and explore hands-on examples to create indexes and perform full-text searches on your Postgres database.

pg_search on Neon

pg_search is currently only available on Neon projects using Postgres 17 and created in an AWS region.

Enable the pg_search extension

You can install the pg_search extension by running the following CREATE EXTENSION statement in the Neon SQL Editor or from a client such as psql that is connected to your Neon database.

CREATE EXTENSION IF NOT EXISTS pg_search;

Understanding text search with pg_search

pg_search enables text searching within your Postgres database, helping you find rows containing specific keywords or phrases in text columns. Unlike basic LIKE queries, pg_search offers advanced scoring, relevance ranking, and language handling to deliver more accurate and context-aware search results. It also addresses major performance limitations of native Postgres full-text search (FTS) by using a BM25 covering index, which indexes text along with metadata (numeric, datetime, JSON, etc.), enabling complex boolean, aggregate, and ordered queries to be processed significantly faster—often reducing query times from minutes to seconds.

Key features include:

  • Advanced relevance ranking: Orders search results by relevance, incorporating phrase, regex, fuzzy matching, and other specialized FTS queries.
  • Powerful indexing with flexible tokenization: Supports multiple tokenizers (e.g., ICU, Lindera) and token filters (e.g., language-aware stemmers), improving search accuracy across different languages.
  • Hybrid search: Combines BM25 scores with pgvector embeddings to enhance search experiences.
  • Faceted search: Allows categorization and filtering of search results based on query parameters.
  • Expressive query builder: Provides an Elastic DSL-like query syntax for constructing complex search queries.

By leveraging these features, pg_search enhances both performance and flexibility, making full-text search in Postgres more efficient and developer-friendly.

BM25: The Relevance scoring algorithm

pg_search utilizes the BM25 (Best Matching 25) algorithm, a widely adopted ranking function by modern search engines, to calculate relevance scores for full-text search results. BM25 considers several factors to determine relevance:

  • Term Frequency (TF): How often a search term appears in a row's text. More occurrences suggest higher relevance.
  • Inverse Document Frequency (IDF): How common or rare your search term is across all rows. Less common words often indicate more specific results.
  • Document Length Normalization: BM25 adjusts for text length, preventing longer rows from automatically seeming more relevant.

BM25 assigns a relevance score to each row, with higher scores indicating better matches.

Inverted Index for efficient searching

For fast searching, pg_search uses an inverted index. Think of it as an index in the back of a book, but instead of mapping topics to page numbers, it maps words (terms) to the database rows (documents) where they appear.

This index structure lets pg_search quickly find rows containing your search terms without scanning every table row, greatly speeding up queries.

With these basics in mind, let's learn how to create a BM25 index and start performing full-text searches with pg_search on Neon.

Getting started with pg_search

pg_search has a special operator, @@@, that you can use in SQL queries to perform full-text searches. This operator allows you to search for specific words or phrases within text columns, returning rows that match your search criteria. You can also sort results by relevance and highlight matched terms. Let us create a sample table, set up a BM25 index, and run some search queries to explore pg_search in action.

To demonstrate how pg_search functions, we'll begin by creating a sample table named mock_items and populating it with example data. ParadeDB provides a convenient tool to generate a test table with sample data for experimentation.

First, connect to your Neon database using the Neon SQL Editor or a client like psql. Once connected, execute the following SQL command:

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'mock_items'
);

It will generate a table named mock_items, which include the columns: id, description, rating, and category, which we will utilize in our search examples.

Let's examine the initial items within our newly created mock_items table. Run the following SQL query:

SELECT description, rating, category
FROM mock_items
LIMIT 3;

The output will display the first three rows from the mock_items table:

description               | rating |  category
--------------------------+--------+-------------
 Ergonomic metal keyboard |      4 | Electronics
 Plastic Keyboard         |      4 | Electronics
 Sleek running shoes      |      5 | Footwear
(3 rows)

Next, let's create our first search index, named item_search_idx, on the mock_items table. This index will enable searching across the id, description, and category columns. It's necessary to designate one column as the key_field; we will use id for this purpose. The key_field serves as a unique identifier for each item within the index.

Key Field Selection

It is crucial to select a column that consistently contains a unique value for every row. This ensures the search index operates as intended.

Run the following SQL command to create the item_search_idx index:

CREATE INDEX item_search_idx ON mock_items
USING bm25 (id, description, category)
WITH (key_field='id');

This will create a BM25 index on the mock_items table, enabling us to search within the id, description, and category columns. The key_field parameter specifies that the id column serves as the unique identifier for each row in the index.

Now that we have our item_search_idx index, let's explore some searches using the @@@ operator in our SQL queries.

Let's begin by finding all items where the description contains the word 'shoes'. Run the following SQL query:

SELECT description, category
FROM mock_items
WHERE description @@@ 'shoes';

This query will locate all rows in mock_items where the description column includes the word 'shoes'.

description           | category
----------------------+----------
 Sleek running shoes  | Footwear
 White jogging shoes  | Footwear
 Generic shoes        | Footwear
(3 rows)

Searching for exact phrases

To search for a specific phrase, enclose it in double quotes. Let's find items where the description contains the exact phrase "metal keyboard":

SELECT description, category
FROM mock_items
WHERE description @@@ '"metal keyboard"';

This search will exclusively find rows that contain the exact phrase "metal keyboard".

description               | category
--------------------------+------------
 Ergonomic metal keyboard | Electronics
(1 row)

If we remove the double quotes, the search will find rows containing both 'metal' and 'keyboard', but the words are not required to be adjacent.

SELECT description, category
FROM mock_items
WHERE description @@@ 'metal keyboard';

The output is:

description               | category
--------------------------+------------
 Ergonomic metal keyboard | Electronics
 Plastic Keyboard         | Electronics
(2 rows)

Advanced search options

paradedb.match: Similar word search and keyword matching

The paradedb.match function is used for keyword searches and for finding words similar to your search term, even with typos.

For example, to find items similar to 'running shoes', use:

SELECT description, category
FROM mock_items
WHERE id @@@ paradedb.match('description', 'running shoes');
description            | category
-----------------------+----------
  Sleek running shoes  | Footwear
  White jogging shoes  | Footwear
  Generic shoes        | Footwear
(3 rows)

You can also use paradedb.match with JSON syntax. For instance, to find items with a description similar to 'running shoes':

SELECT description, category
FROM mock_items
WHERE id @@@ '{"match": {"field": "description", "value": "running shoes"}}'::jsonb;

Searching with typos: Fuzzy matching

To retrieve results even with minor errors in the search term, you can use paradedb.match with the distance option.

Suppose you mistyped 'running' as 'runing'. You can still find relevant results using fuzzy matching:

SELECT description, category
FROM mock_items
WHERE id @@@ paradedb.match('description', 'runing', distance => 1);

This will find items where the description is similar to 'runing' within a Levenshtein distance of 1.

description              | category
-------------------------+----------
    Sleek running shoes  | Footwear
(1 rows)

paradedb.phrase: Searching for phrases with words nearby

The paradedb.phrase function, combined with the slop option, helps you find phrases even if the words are not immediately adjacent. The slop value specifies the number of intervening words allowed. A slop of 1 permits one extra word in between.

SELECT description, category
FROM mock_items
WHERE id @@@ paradedb.phrase('description', ARRAY['white', 'shoes'], slop => 1);

This query will find rows where 'white' and 'shoes' are within one word or less of each other.

description              | category
-------------------------+----------
    White jogging shoes  | Footwear
(1 rows)

Sorting search results by relevance

To ensure the most relevant results are displayed first, you can sort your search results by relevance. Utilize paradedb.score() with ORDER BY to achieve this:

SELECT description, category, rating, paradedb.score(id)
FROM mock_items
WHERE description @@@ 'shoes'
ORDER BY paradedb.score(id) DESC;

This query will find items matching 'shoes' and then present them in order from most to least relevant based on their search score (BM25 relevance score).

description           | category | rating | score
----------------------+----------+--------+-------------
  Generic shoes       |	Footwear |	 4   |	2.8772602
  Sleek running shoes |	Footwear |	 5   |	2.4849067
  White jogging shoes |	Footwear |   3   |	2.4849067
(3 rows)

Highlighting search results

To highlight matched terms in the search results, you can use the paradedb.snippet() function. This function generates snippets of text containing the matched words, making it easier to identify relevant content.

SELECT id, paradedb.snippet(description)
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 3;

This will provide snippets of the description where the words matching your search are wrapped in <b></b> tags by default. This visual cue makes the matched terms stand out when results are displayed in your application.

id  | snippet
----+-----------------------------------
  3 | Sleek running <b>shoes</b>
  4 | White jogging <b>shoes</b>
  5 | Generic <b>shoes</b>
(3 rows)

If you prefer different tags, you can customize the tags using the start_tag and end_tag options with paradedb.snippet(). For example:

SELECT id, paradedb.snippet(description, start_tag => '<start>', end_tag => '</end>')
FROM mock_items
WHERE description @@@ 'shoes'
LIMIT 3;

This will wrap the matched words in <start> and </end> tags instead of the default <b> and </b>.

id  | snippet
----+-----------------------------------
  3 | Sleek running <start>shoes</end>
  4 | White jogging <start>shoes</end>
  5 | Generic <start>shoes</end>
(3 rows)

Combining search words with AND/OR

To create more complex searches, you can use OR and AND operators to combine keywords. For instance, to retrieve items with 'shoes' in the description OR 'Electronics' in the category, you can use:

SELECT description, category
FROM mock_items
WHERE description @@@ 'shoes' OR category @@@ 'Electronics'
LIMIT 3;

This will find items that satisfy either of these conditions.

description               | category
--------------------------+------------
Ergonomic metal keyboard | Electronics
Plastic Keyboard         | Electronics
Sleek running shoes      | Footwear
(3 rows)

Query builder functions

In addition to query strings, query builder functions can be used to compose various types of complex queries.

For a list of supported query builder functions, refer to ParadeDB's Query Builder documentation.

Joined search with multiple tables

pg_search supports full-text search over JOINs, which is crucial for database schemas that store data in a normalized fashion. Let's create a table called orders that references our mock_items table:

CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'orders',
  table_type => 'Orders'
);

ALTER TABLE orders
ADD CONSTRAINT foreign_key_product_id
FOREIGN KEY (product_id)
REFERENCES mock_items(id);

SELECT * FROM orders LIMIT 3;

Next, let's create a BM25 index over the orders table:

CREATE INDEX orders_idx ON orders
USING bm25 (order_id, customer_name)
WITH (key_field='order_id');

Now we can perform a search across both tables using a JOIN. The following query searches for rows where customer_name matches 'Johnson' and description matches 'shoes':

SELECT o.order_id, o.customer_name, m.description
FROM orders o
JOIN mock_items m ON o.product_id = m.id
WHERE o.customer_name @@@ 'Johnson' AND m.description @@@ 'shoes'
ORDER BY order_id
LIMIT 5;

This demonstrates how pg_search can be used to search across related tables, allowing for powerful queries that combine data from multiple sources.

Performance optimizations for pg_search

To optimize pg_search performance, adjust both Postgres and pg_search settings for indexing and query speed.

pg_search parameter names start with paradedb. You can configure both Postgres and pg_search settings for the current session using SET.

Index build time

Optimize index build time with these settings. The maintenance_work_mem setting is typically only one requiring tuning. The other two setting have proven default values that typically do not require modification.

  • maintenance_work_mem: : Sets the maximum amount of memory used for maintenance operations such as CREATE INDEX. Increasing this setting can speed up index builds by improving Write-Ahead Log (WAL) performance. For example, on a 100-million-row table, allocating multiple GBs can reduce index build time from hours to minutes.

    In Neon, maintenance_work_mem is set based on your compute size. You can increase it for the current session. Do not exceed 50–60% of your compute's available RAM. See Neon parameter settings by compute size.

    SET maintenance_work_mem = '10 GB';
  • paradedb.create_index_memory_budget: Defines the memory per indexing thread before writing index segments to disk. The default is 1024 MB (1 GB). Large tables may need a higher value. If set to 0, the budget is derived from maintenance_work_mem and paradedb.create_index_parallelism.

    SET paradedb.create_index_memory_budget = 2048;
  • paradedb.create_index_parallelism: Controls the number of threads used during CREATE INDEX. The default is 0, which automatically detects the available parallelism of your Neon compute. You can explicitly set:

    SET paradedb.create_index_parallelism = 8;

For more information about optimizing BM25 index size, see ParadeDB — Index Size.

Throughput

note

Most users will not need to adjust these advanced throughput settings.

Tune INSERT/UPDATE/COPY throughput for the BM25 index with these settings:

  • paradedb.statement_parallelism: Controls indexing threads during INSERT/UPDATE/COPY. Default is 0 (auto-detects parallelism).

    • Use 1 for single-row atomic inserts/updates to avoid unnecessary threading.

    • Use a higher value for bulk inserts and updates.

      SET paradedb.statement_parallelism = 1;
  • paradedb.statement_memory_budget: Memory per indexing thread before writing to disk. Default is 1024 MB (1 GB). Higher values may improve indexing performance. See ParadeDB — Statement Memory Budget.

    • If set to 0, maintenance_work_mem / paradedb.statement_parallelism is used.

    • For single-row updates, 15 MB prevents excess memory allocation.

    • For bulk inserts/updates, increase as needed.

      SET paradedb.statement_memory_budget = 15;

Search performance

Search performance can benefit from parallel workers, more memory provided by larger Neon compute sizes, and preloading indexes into memory.

Parallel workers

Increase parallel workers to speed up indexing:

  • max_worker_processes: Controls total worker processes across all connections.

    SET max_worker_processes = 8;
  • max_parallel_workers: Defines the number of workers available for parallel scans.

    SET max_parallel_workers = 8;
  • max_parallel_workers_per_gather: Limits parallel workers per query. The default in Neon is 2, but you can adjust. The total number of parallel workers should not exceed your Neon compute's vCPU count. See Neon parameter settings by compute size. sql SET max_parallel_workers_per_gather = 8;

Keeping indexes in memory

Keeping indexes in memory improves query performance by reducing disk access. In Postgres, shared_buffers defines the buffer cache size, which determines how much memory is allocated for caching data. In Neon, this value is set automatically based on your compute size.

In addition to shared_buffers, Neon’s Local File Cache (LFC) extends memory up to 75% of your compute’s RAM. This allows frequently accessed indexes and data to remain in memory, improving performance.

Both shared_buffers and the LFC size depend on your compute size. For details, see How to size your compute.

To further optimize performance, you can use the Postgres pg_prewarm extension to preload indexes into memory. This ensures fast query response times by warming up the cache after index creation or a restart of your Neon compute.

To run pg_prewarm on an index:

pg_prewarm('index_name')

For additional details, see Running pg_prewarm on indexes.

Best practices for using pg_search

To optimize your search functionality and ensure efficient performance, consider the following best practices when using pg_search:

  • Analyze query plans: Use EXPLAIN to analyze query plans and identify potential bottlenecks.
  • Index all relevant columns: Include all columns used in search queries, sorting, or filtering for optimal performance.
  • Utilize query builder functions: Leverage query builder functions or JSON syntax for complex queries like fuzzy matching and phrase matching.

Conclusion

You have successfully learned how to enable and utilize the pg_search extension on Neon for full-text search. By leveraging BM25 scoring and inverted indexes, pg_search provides powerful search capabilities directly within your Postgres database, eliminating the need for external search engines and ensuring real-time, ACID-compliant search functionality.

While this guide provides a comprehensive introduction to pg_search on Neon, it is not exhaustive. We haven't covered topics like:

  • Advanced tokenization and language handling: Exploring specialized tokenizers and language-specific features.
  • The full range of query types: Exploring the full range of query functions like more_like_this, regex_phrase, and compound queries for complex search needs.
  • Leveraging fast fields: Optimizing performance with fast fields for aggregations, filtering, and sorting, and understanding their configuration.
  • Query-time boosting: Fine-tuning search relevance by applying boosts to specific fields or terms within your queries.

For a deeper dive into these and other advanced features, please refer to the official ParadeDB documentation.

Resources