GPU-Accelerated Vector Indexes for Approximate Nearest-Neighbor Search

Compact, updatable vector indexes that exploit GPU parallelism for fast approximate nearest-neighbor search.

Approximate nearest-neighbor search (ANNS) over high-dimensional vectors underpins modern retrieval, recommendation, and retrieval-augmented generation. Serving these workloads at scale demands indexes that are both fast and memory-efficient, yet most high-performance ANNS indexes are static and must be rebuilt to absorb new data. This project designs GPU-accelerated vector indexes that are compact enough to fit large datasets in GPU memory and dynamic enough to ingest updates in place.

We built Jasper (McCoy et al., 2026), a GPU-accelerated ANNS index that uses quantization to shrink the memory footprint while sustaining high query throughput, and supports in-place updates so the index can evolve with the data instead of being rebuilt. This work draws on our broader effort in compact and dynamic GPU data structures and the memory-management substrate they rely on.

References

2026

  1. arXiv 2026
    GPU-Accelerated ANNS: Quantized for Speed, Built for Change
    Hunter McCoy, Zikun Wang, and Prashant Pandey
    2026