Postgres UUID data type
Work with UUIDs in Postgres
UUID
stands for Universally Unique Identifier
. A UUID
is a 128-bit value used to ensure global uniqueness across tables and databases.
In Postgres, the UUID data type is ideal for assigning unique identifiers to entities such as users, orders, or products. They are particularly useful in distributed scenarios, where the system is spread across different databases or services, and unique keys need to be generated independently.
Storage and syntax
UUIDs are stored as 128-bit values, represented as a sequence of hexadecimal digits. They are typically formatted in five groups, of sizes 8, 4, 4, 4 and 12, separated by hyphens. For example:
123e4567-e89b-12d3-a456-426655440000
, ora0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11
Postgres accepts UUID values in the above format, while also allowing uppercase letters and missing hyphen separators. You can also generate them using functions like gen_random_uuid()
which is available natively in Postgres, or the uuid_generate_v4()
function which requires the uuid-ossp
extension.
Example usage
Consider a scenario where we track user sessions in a web application. UUIDs are commonly used to identify sessions due to their uniqueness.
The query below creates a table and inserts some sample session data:
This query returns the following:
To retrieve a specific session, we can query by its UUID:
This query returns the following:
Other examples
Using UUID column as primary key
Using UUIDs as primary keys is common since the likelihood of the same UUID value being generated twice is very small. This is helpful in distributed systems or when merging data from different sources.
For example, we can create a table to store products and use a UUID column as the primary key.
This query returns the following:
Avoiding data leakage
In systems where data security is a concern, using non-sequential IDs like UUIDs can help obscure the total number of records, preventing potential information leaks. This is in contrast to the sequential IDs provided by the SERIAL
data type, which can inadvertently reveal information about the number of users, orders, etc.
For example, the query below creates a table that tracks users of an API with some sample data:
This query returns the following:
Notice that the serial_id
column hints at the number of rows already present in the table.
Additional considerations
- Randomness and uniqueness: UUIDs are designed to be globally unique, but there's an extremely small probability of generating a duplicate UUID. If you're automatically generating UUIDs at insertion, and a duplicate UUID is generated, the insertion will fail. In the rare event that a collision occurs, applications that generate UUIDs should implement a retry mechanism.
- Performance and indexing: UUIDs are larger than traditional integer IDs, requiring more storage space. Index structures on UUID columns therefore consume more storage as well. However, in terms of performance for read-heavy workloads, leveraging indexed UUID columns for filtering or sorting can significantly improve query performance. In this context, you have to evaluate the tradeoff between storage efficiency and query performance.
- Readability: UUIDs are not human-readable, which can make debugging or manual inspection of data more challenging.
Resources
Need help?
Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more detail, see Getting Support.