What's new in pgvector v0.7.0

Real-world embedding datasets often contain redundancy buried within the vector space. For example, when vectors cluster around certain central points in a multidimensional space, it reveals an exploitable structure. By reducing this redundancy, we can achieve memory and performance savings with a minimal impact on precision. Several approaches to leverage this idea have been introduced in pgvector since version 0.7.0:

float16 vector representation
sparse vectors
bit vectors

Float16 vectors

An HNSW index is most efficient when it fits into shared memory and avoids being evicted due to concurrent operations, which Postgres performs to minimize costly I/O operations. Historically, pgvector supported only 32-bit vectors. In version 0.7.0, pgvector introduces 16-bit float HNSW indexes which consume exactly half the memory. That reduction in memory keeps operations at maximum performance for twice as long.

There are two options when using float16 vectors:

Index using float16, but the underlying table continues to use float32
The index and the underlying table both use float16. This options uses 50% as much disk space in addition to requiring 50% less shared memory to operate efficiently. Performance is further improved with more vectors fitting in a single Postgres page and with fewer page evictions due to concurrent operations.

To duplicate an existing float32 embedding table to float16 one:


_11create table embedding_half (
_11  id serial,
_11  vector halfvec(1536),
_11  primary key (id)
_11);
_11
_11insert into embedding_half (vector)
_11select
_11  vector::halfvec(1536)
_11from
_11  embedding_full;

With 900K OpenAI 1536-dimensional vectors, the table size is 3.5Gb. For comparison, embedding_full required 7Gb.

Then we can build a float16 HNSW index:


_10create index on embedding_half using hnsw (vector halfvec_l2_ops);

To test the performance of index creation, we chose a c7g.metal instance with 128Gb memory and the following parameters:


_10shared_buffers = 50000MB
_10maintenance_work_mem = 30000MB
_10max_parallel_maintenance_workers = {0-63}
_10wal_level=minimal
_10max_wal_size = 10GB
_10autovacuum = off
_10full_page_writes = off
_10fsync = off

HNSW build times recently experienced a stepwise improvement in the 0.6.2 release, which introduced parallel builds. 0.7.0 with the halfvec (float16) feature improves that speedup a further 30%.

Note that float16 vector arithmetic on the ARM architecture is identical to float32, so serial build times (with one parallel worker) have not improved. However there is a significant difference for parallel builds due to better pages and I/O utilization. Also note that this test doesn't use pre-warming or other artificial enhancements.

Both heap and HNSW relations for float16 occupy only half of the space compared to the previous float32 ones.

There is a proposal to speed it up even more in the future by using SVE intrinsics on ARM architecture (see: https://github.com/pgvector/pgvector/pull/536).

Jonathan Katz made his measurements on HNSW performance using r7gd.16xlarge (64 vCPU, 512GiB RAM), and his results are even better. For float16, HNSW build times are up to 3x faster. For select’s performance, ANN benchmark results show that precision is not changed with decreasing bitness, and queries per second (QPS) is similar to in-memory cases. But when real machine queries are using I/O or some HNSW pages are evicted from memory due to concurrent connections, there would be a meaningful difference. With only half of memory needed to accommodate the same HNSW index, cost for the same performance and precision is also significantly less.

	Vector / Vector	Vector / HalfVec
Index size (MB)	7734	3867
Index build time (s)	264	90
Recall @ ef_search=10	0.819	0.809
QPS @ ef_search=10	1231	1219
Recall @ ef_search=40	0.945	0.945
QPS @ ef_search=40	627	642
Recall @ ef_search=200	0.987	0.987
QPS @ ef_search=200	191	190

For full results on the different datasets, see this GitHub issue.

Sparse vectors

If vectors contain many zero components, then a sparse vector representation can save significant storage space. For example, to populate sparse vectors:


_10create embedding_sparse (
_10  id serial,
_10  vector sparsevec(1536),
_10  primary key (id)
_10)
_10
_10insert into embedding_sparse (embedding) values ('{1:0.1,3:0.2,5:0.3}/1536'), ('{1:0.4,3:0.5,5:0.6}/1536');

The sparse vector only consumes storage space for the non-zero components. In this case, thats 3 values in a 1536 vector.

Note the new vector syntax {1:3,3:1,5:2}/1536 for the sparse vector representation in:


_10select * from embedding_sparse order by vector <-> '{1:3,3:1,5:2}/1536' limit 5;

Bit vectors

Using binary quantization we can represent float vector as a vector in binary space. This reduces storage size dramatically and is intended as a way to quickly “pre-select” from a data set before performing an additional search within the subset. When properly parameterized, the secondary select can be very fast, even without an index.


_10create index on embedding
_10  using hnsw ((binary_quantize(vector)::bit(1000)) bit_hamming_ops);
_10
_10select
_10  *
_10from
_10  embedding
_10order by
_10  binary_quantize(vector)::bit(3) <~> binary_quantize('[1,-2,3]')
_10limit 5;

To use a binary quantized HNSW index to pre-select from a larger dataset and then make a fast selection from the resulting subset, without an index:


_12select * from (
_12  select
_12    *
_12  from
_12    embedding
_12  order by
_12    binary_quantize(vector)::bit(3) <~> binary_quantize('[1,-2,3]')
_12  limit 20
_12)
_12order by
_12  vector <=> '[1,-2,3]'
_12limit 5;

It allows building a small and fast HNSW index for select, insert, or update operations while still having fast vector search. Exact configuration for the limit clauses are data dependent, so you’ll want to experiment with the sub-select size and the number of final results directly on your own dataset.

New distance functions

pgvector 0.7.0 also added support for L1 distance operator <+>.

And new distance types for indexing:

L1 distance - added in 0.7.0


_10create index on items using hnsw (embedding vector_l1_ops);

Hamming distance - added in 0.7.0


_10create index on items using hnsw (embedding bit_hamming_ops);

Jaccard distance - added in 0.7.0


_10create index on vector using hnsw (vector bit_jaccard_ops);

Conclusion

Over the last year pgvector has had significant development in both functionality and performance, including HNSW indexes, parallel builds, and many other options. With the introduction of half vectors (float16), sparse vectors, and bit vectors, we're now seeing over 100x speedup compared to one year ago.

For a more complete comparison of pgvector performance over the last year, check out this post by Jonathan Katz.

Using v0.7.0 in Supabase

We're currently testing pgvector v0.7.0 in Supabase. Stay tuned - we expect to make this available some time next week.

What's new in pgvector v0.7.0

Float16 vectors

Sparse vectors

Bit vectors

New distance functions

Conclusion

Using v0.7.0 in Supabase

Kahoot alternative at GA week community meetup

Open Source Hackathon 2024 winners

Build in a weekend, scale to millions

What's new in pgvector v0.7.0

Float16 vectors#

Sparse vectors#

Bit vectors#

New distance functions#

Conclusion#

Using v0.7.0 in Supabase#

Kahoot alternative at GA week community meetup

Open Source Hackathon 2024 winners

Build in a weekend, scale to millions

Float16 vectors

Sparse vectors

Bit vectors

New distance functions

Conclusion

Using v0.7.0 in Supabase