Implementing UUIDs v7 in pure SQL
In May 2024, the IETF standard on UUIDs (Universally Unique IDentifiers) has been updated with RFC 9562, finally officializing the UUID Version 7. This version is known to be a much better choice for database indexes than previous ones, since it has values generated consecutively already sorted. PostgreSQL does not yet have a built-in function to generate UUIDs v7, but of course several extensions do exist. The ones I found tend to require a compilation step and superuser privileges to install, as they’re written in “untrusted languages” like C or Rust. However, UUID-v7 functionalities can be implemented in pure SQL so they can be installed easily everywhere. In this post, let’s see how to do that.
The structure of UUID-v7
Per the RFC:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms | ver | rand_a |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
unix_ts_ms
: 48-bit big-endian unsigned number of the Unix Epoch timestamp in milliseconds as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).ver
: The 4-bit version field as defined by Section 4.2, set to 0b0111 (7). Occupies bits 48 through 51 of octet 6.rand_a
: 12 bits of pseudorandom data to provide uniqueness as per Section 6.9 and/or optional constructs to guarantee additional monotonicity as per Section 6.2. Occupies bits 52 through 63 (octets 6-7).var
: The 2-bit variant field as defined by Section 4.1, set to 0b10. Occupies bits 64 and 65 of octet 8.rand_b
: The final 62 bits of pseudorandom data to provide uniqueness as per Section 6.9 and/or an optional counter to guarantee additional monotonicity as per Section 6.2. Occupies bits 66 through 127 (octets 8-15).
Generating values
The method used to generate a UUID-v7 is to start from a UUID v4 obtained from the built-in
function gen_random_uuid()
, and to overwrite bits at the places of unix_ts_ms
and ver
with the timestamp and version number.
I got the implementation from Kyle Hubert’s “kjmph” gist, and simply converted it from plpgsql
to sql
.
The code:
CREATE FUNCTION uuidv7() RETURNS uuid
AS $$
-- Replace the first 48 bits of a uuidv4 with the current
-- number of milliseconds since 1970-01-01 UTC
-- and set the "ver" field to 7 by setting additional bits
select encode(
set_bit(
set_bit(
overlay(uuid_send(gen_random_uuid()) placing
substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3)
from 1 for 6),
52, 1),
53, 1), 'hex')::uuid;
$$ LANGUAGE sql volatile;
Timestamp precision
There’s a potential problem with the millisecond precision: if generating values
at a rate of more than 1000 per second, which is not fast with current processors,
comparing consecutive UUIDs comes down to comparing the random bits
starting at the rand_a
field. As a result, values generated less than
1 millisecond apart do not come out sorted.
It can be illustrated with the following query, where i
is the
the index of the value in a series and row_number
the position
in the sorted series.
select i,uuid,row_number() over(order by uuid) from
(select i,uuidv7() as uuid from generate_series(1,10) as i) s
order by i;
i | uuid | row_number
----+--------------------------------------+------------
1 | 0190b0e3-0cca-7d1f-a6f4-2da38f611817 | 8
2 | 0190b0e3-0cca-7765-9c15-34d1984ac3ab | 5
3 | 0190b0e3-0cca-7c86-9b79-676e37db9539 | 6
4 | 0190b0e3-0cca-75a2-8918-3e185505fe33 | 4
5 | 0190b0e3-0cca-7ec0-8365-95ee2c7b4e9b | 9
6 | 0190b0e3-0cca-7f28-9f9b-08bae787399d | 10
7 | 0190b0e3-0cca-7d17-8762-f57c57dea0aa | 7
8 | 0190b0e3-0cca-7138-97cb-02fa0f513871 | 2
9 | 0190b0e3-0cca-70dd-8891-b0f33e8ee234 | 1
10 | 0190b0e3-0cca-7325-8433-26fca1e31da3 | 3
The values all start with the same timestamp
0190b0e3-0cca
. The next digit 7
is the version number, and
after that, we have random digits (rand_a
), which is why
the relative order of values is random.
When the generation of values happen less than one millisecond apart, we want to either:
- ignore the issue, if we don’t really care that close values are unsorted. That’s perfectly reasonable in many cases. In an index, they will still be near each other.
- use a counter. The specification suggests two kinds of counters in “Method 1” and “Method 2” (Section 6.2). I did not implement that in this extension. This would require storing a state between invocations, which is not simple.
- use sub-millisecond precision, as the specification allows in “Method 3” (Section 6.2). This is the easiest in pure SQL.
Timestamps with sub-millisecond precision
In postgres, timestamp values hold a number of microseconds since 2000-01-01, so that’s the maximum precision of clock_timestamp()
. The internal clock of the machine may offer more or less, but at least we can use as much precision as available, up to 1 microsecond.
UUID-v7 allows for 12 bits of additional timestamp precision, by using the rand_a
field,
trading randomness for sub-millisecond time ticks.
In this variant of the function, we use the fractional part after the milliseconds and place them in these 12 bits.
The code:
/* Version with the "rand_a" field containing sub-milliseconds (method 3 of the spec)
clock_timestamp() is hoped to provide enough precision and consecutive
calls to not happen fast enough to output the same values in that field.
The uuid is the concatenation of:
- 6 bytes with the current Unix timestamp (number of milliseconds since 1970-01-01 UTC)
- 2 bytes with
- 4 bits for the "ver" field
- 12 bits for the fractional part after the milliseconds
- 8 bytes of randomness from the second half of a uuidv4
*/
CREATE FUNCTION uuidv7_sub_ms() RETURNS uuid
AS $$
select encode(
substring(int8send(floor(t_ms)::int8) from 3) ||
int2send((7<<12)::int2 | ((t_ms-floor(t_ms))*4096)::int2) ||
substring(uuid_send(gen_random_uuid()) from 9 for 8)
, 'hex')::uuid
from (select extract(epoch from clock_timestamp())*1000 as t_ms) s
$$ LANGUAGE sql volatile;
Now when trying the previous query with the more precise timestamps , we get the desired order:
select i,uuid,row_number() over(order by uuid) from
(select i,uuidv7_sub_ms() as uuid from generate_series(1,10) as i) s order by i;
Result
i | uuid | row_number
----+--------------------------------------+------------
1 | 0190b75f-2404-7e2d-94c7-5fd73d2bea51 | 1
2 | 0190b75f-2404-7ee5-a1c2-c4179159c04d | 2
3 | 0190b75f-2404-7f37-bd02-ec5c22092763 | 3
4 | 0190b75f-2404-7f75-ba8a-1e63fd422786 | 4
5 | 0190b75f-2404-7fb2-96f5-ed46d69c2cc0 | 5
6 | 0190b75f-2404-7ff0-a3d8-e9cfb18b726e | 6
7 | 0190b75f-2405-7029-8747-8f55e90c6b5f | 7
8 | 0190b75f-2405-7062-845a-c0136b954aa7 | 8
9 | 0190b75f-2405-70a0-9f8a-fc9c920d9e1f | 9
10 | 0190b75f-2405-70d9-9b1e-a8211a046e2c | 10
The values start with 0190b435-34c2-7NNN
, where NNN
are 3 hex digits
representing microseconds scaled to 4096, instead of being random numbers.
We can see that these 12-bit additional time ticks change fast enough
that two consecutive UUIDs generated in the same millisecond cannot share the same
value in that field.
We could still have an issue if the server was so much faster that even this counter may produce the same value between consecutive generations.
In this case we could use a monotonic random counter, as suggested by the specification, and “stealing” more randomness bits.
Partitioning by UUID-v7
When UUIDs are used as primary keys and they happen to contain a timestamp, it can be convenient to use them as a partition key. For instance we could have monthly partitions with values ranging from the first timestamp of the month to the first timestamp of the next month. To generate the partition boundaries, a UUID where all the random bits are set to zero can be used.
This is what the uuidv7_boundary(timestamptz)
function does.
For instance, given a partitioned table like this:
CREATE TABLE sample(id uuid primary key, payload text) PARTITION BY RANGE(id);
We may use a query like the following to easily generate 12 monthly partitions for the year 2024:
SELECT format('create table sample_2024_%s partition of sample for values from (%L) TO (%L)',
to_char(n,'FM00'),
uuidv7_boundary('2024-01-01'::timestamptz+(n-1)*'1 month'::interval),
uuidv7_boundary('2024-01-01'::timestamptz+n*'1 month'::interval))
from generate_series(1,12) as n;
(use \gexec
in psql to run the generated statements).
Getting the timestamp from the UUID-v7 values
Another functionality that may be needed is to extract the timestamp from a UUIDv7. The specification does not make any promise about sub-millisecond precision, so here we get only the milliseconds portion. Custom versions with more precision could be written, but with the caveat that they would not apply to UUIDv7 generated anywhere.
The function is:
/* Extract the timestamp in the first 6 bytes of the uuidv7 value.
Use the fact that 'xHHHHH' (where HHHHH are hexadecimal numbers)
can be cast to bit(N) and then to int8.
*/
CREATE FUNCTION uuidv7_extract_timestamp(uuid) RETURNS timestamptz
AS $$
select to_timestamp(
right(substring(uuid_send($1) from 1 for 6)::text, -1)::bit(48)::int8 -- milliseconds
/1000.0);
$$ LANGUAGE sql immutable strict;
Conclusion
Most of the Postgres tests and discussions on UUID-v7 I’ve seen these last months (see the References below) used a patched Postgres and assumed that this will be supported in Postgres 17. That version is in feature freeze now, without it, so that won’t happen until version 18 at best. But the point is that UUID-v7 can be used right now with existing versions of Postgres, either by generating them in applications, or using server-side functions as in this post.
The SQL functions in this post can be created in any environment. They’re also packaged as an extension available in a github repository: postgres-uuidv7-sql. Feel free to submit bug reports and comments over there if you use them.
References
-
RFC 9562 (May 2024)
-
Postgres CommitFest Work-In-Progress patch and discussion to add UUID-v7 functions in Postgres core.
-
UUID Benchmark War comparing the insert performance with different primary keys, especially uuidv4 and uuidv7 (on Jeremy Schneider’s blog, February 2024)
-
How to partition Postgres tables by timestamp based UUIDs (On the pganalyze blog, January 2024)
-
Benchmark UUIDv4 vs UUIDv7 Primary Keys (on Mike Blum’s blog, December 2023).
-
Postgres.FM podcast #77 on “Partitioning by ULID” (December 2023), or on YouTube. ULID is a non-standard variant of the UUID that implements the same idea as UUID-v7.
-
Postgres.FM podcast #51 on “the performance aspects of using UUID for primary keys” (June 2023), or on YouTube.