Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations by suri-kumkaran · Pull Request #730 · microsoft/DiskANN

suri-kumkaran · 2026-02-06T12:08:27Z

Summary

Experimental multi-vector support with fast Chamfer distance for f32 embeddings and benchmarking infrastructure.

Changes

Core Types

MultiVector - Row-major token embeddings
TransposedMultiVector - Block-transposed layout (16 vectors/block, SIMD-optimized)
Chamfer<Approach> - Generic distance using Inner Product (implements DistanceFunction)

Implementations

NaiveApproach - Scalar baseline
SimdApproach - SIMD via diskann_vector::InnerProduct
TransposedApproach - Block-transposed SIMD
TransposedWithTilingApproach - Query tiling (transposes docs, processes query pairs)
QueryTransposedWithTilingApproach - Doc tiling (transposes query, processes doc pairs)
SgemmApproach - BLAS SGEMM + SIMD row-max (via faer library)

Benchmark Results (100 points, 10 iterations)

Machine: Intel Core i7-1365U, AVX2 supported, AVX-512 not supported, 32 GB RAM
Note: Times are median over 50 measurements, each measuring 10 consecutive distance computations across 100 points.

Speedup vs SIMD Baseline (Median, Lower Latency = Better)

Configuration	SIMD (µs)	transposed_simd	transposed_tiling	query_transposed_tiling	sgemm
dim=128, doc=32, query=8	2,237	1.34x	1.75x	1.05x	1.13x
dim=128, doc=64, query=16	9,224	1.42x	2.07x	2.35x	1.48x
dim=128, doc=128, query=32	47,882	1.32x	1.86x	2.64x	1.88x
dim=256, doc=32, query=8	4,654	1.26x	1.69x	1.13x	0.96x
dim=256, doc=64, query=16	25,809	1.56x	1.94x	2.40x	1.87x
dim=256, doc=128, query=32	101,093	1.41x	1.71x	2.67x	1.96x
dim=256, doc=16, query=32	8,239	1.22x	1.77x	2.02x	1.57x
dim=384, doc=32, query=8	8,412	1.41x	1.65x	1.30x	1.24x
dim=384, doc=64, query=16	38,162	1.30x	1.47x	1.70x	1.66x
dim=384, doc=128, query=32	171,431	1.53x	1.94x	2.04x	2.16x
Average	—	1.38x	1.79x	1.93x	1.59x

Dimension 256

Approach	Q×D	Mean (µs)	Speedup
SIMD	8×32	3328	1.0x
Tiling	8×32	2207	1.5x
QueryTiling	8×32	3243	1.0x
SGEMM	8×32	2675	1.2x
SIMD	16×64	18464	1.0x
Tiling	16×64	10458	1.8x
QueryTiling	16×64	6326	2.9x
SGEMM	16×64	7014	2.6x
SIMD	32×128	72197	1.0x
Tiling	32×128	39453	1.8x
QueryTiling	32×128	25357	2.8x
SGEMM	32×128	19727	3.7x

Dimension 384

Approach	Q×D	Mean (µs)	Speedup
SIMD	8×32	7026	1.0x
Tiling	8×32	3609	1.9x
QueryTiling	8×32	4816	1.5x
SGEMM	8×32	4077	1.7x
SIMD	16×64	27518	1.0x
QueryTiling	16×64	9750	2.8x
SGEMM	16×64	10070	2.7x
SIMD	32×128	107772	1.0x
Tiling	32×128	50072	2.2x
QueryTiling	32×128	50669	2.1x
SGEMM	32×128	28764	3.7x

Future Work

Add RFC based on findings for DiskANN integration
Support for additional element types (f16, u8 quantized, etc.)
AVX-512 support: Larger registers could enable tile size 4 (processing 4 queries simultaneously)
SIMD-accelerated horizontal MinMax: Hardware-accelerated horizontal min/max reductions across SIMD lanes for faster per-query score aggregation

Testing

cargo build --release -p multi-vector
cargo run --release -p multi-vector --bin multivec-bench -- run \
    --input-file multi-vector/examples/bench.json --output-file results.json

Contributing

This work is experimental and will be submitted as separate PRs.

codecov-commenter · 2026-02-06T12:46:08Z

Codecov Report

❌ Patch coverage is 68.86326% with 378 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.69%. Comparing base (354df71) to head (63f07fa).

Files with missing lines	Patch %	Lines
...ctor-bench/src/distance/query_transposed_tiling.rs	49.53%	163 Missing ⚠️
...lti-vector-bench/src/distance/transposed_tiling.rs	43.50%	113 Missing ⚠️
experimental-multi-vector-bench/src/bench/mod.rs	70.88%	23 Missing ⚠️
...ntal-multi-vector-bench/src/distance/transposed.rs	75.00%	23 Missing ⚠️
diskann-quantization/src/multi_vector/matrix.rs	39.39%	20 Missing ⚠️
experimental-multi-vector-bench/src/bench/input.rs	68.00%	16 Missing ⚠️
...xperimental-multi-vector-bench/src/bench/runner.rs	95.37%	10 Missing ⚠️
...ental-multi-vector-bench/src/bin/multivec_bench.rs	87.87%	4 Missing ⚠️
...erimental-multi-vector-bench/src/distance/sgemm.rs	97.79%	3 Missing ⚠️
...xperimental-multi-vector-bench/src/multi_vector.rs	87.50%	3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (68.86%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #730      +/-   ##
==========================================
- Coverage   89.00%   88.69%   -0.31%     
==========================================
  Files         428      440      +12     
  Lines       78294    79505    +1211     
==========================================
+ Hits        69686    70519     +833     
- Misses       8608     8986     +378

Flag	Coverage Δ
miri	`88.69% <68.86%> (-0.31%)`	⬇️
unittests	`88.69% <68.86%> (-0.31%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...xperimental-multi-vector-bench/src/distance/mod.rs	`100.00% <100.00%> (ø)`
...erimental-multi-vector-bench/src/distance/naive.rs	`100.00% <100.00%> (ø)`
...perimental-multi-vector-bench/src/distance/simd.rs	`100.00% <100.00%> (ø)`
...erimental-multi-vector-bench/src/distance/sgemm.rs	`97.79% <97.79%> (ø)`
...xperimental-multi-vector-bench/src/multi_vector.rs	`87.50% <87.50%> (ø)`
...ental-multi-vector-bench/src/bin/multivec_bench.rs	`87.87% <87.87%> (ø)`
...xperimental-multi-vector-bench/src/bench/runner.rs	`95.37% <95.37%> (ø)`
experimental-multi-vector-bench/src/bench/input.rs	`68.00% <68.00%> (ø)`
diskann-quantization/src/multi_vector/matrix.rs	`94.74% <39.39%> (-3.41%)`	⬇️
experimental-multi-vector-bench/src/bench/mod.rs	`70.88% <70.88%> (ø)`
... and 3 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

diskann-quantization/src/multi_vector/matrix.rs

Suryansh Gupta and others added 4 commits February 5, 2026 19:18

Add Multi-Vector distance function benchmarking experimental crate

efbd5be

Add bin runner file

5bec09a

Merge branch 'main' into users/suryangupta/multivector-bench

b7346ab

Run on local laptop machine

0134ebd

hildebrandmw reviewed Feb 6, 2026

View reviewed changes

diskann-quantization/src/multi_vector/matrix.rs Outdated Show resolved Hide resolved

suri-kumkaran mentioned this pull request Feb 6, 2026

Add RFC Process and RFC 0001 — Multi-Vector Distance Functions #731

Open

suri-kumkaran self-assigned this Feb 6, 2026

Suryansh Gupta added 2 commits February 9, 2026 18:11

Review comments

0604ad3

Merge branch 'main' into users/suryangupta/multivector-bench

63f07fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730

Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730
suri-kumkaran wants to merge 6 commits intomainfrom
users/suryangupta/multivector-bench

suri-kumkaran commented Feb 6, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

suri-kumkaran commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Types

Implementations

Benchmark Results (100 points, 10 iterations)

Speedup vs SIMD Baseline (Median, Lower Latency = Better)

Dimension 256

Dimension 384

Future Work

Testing

Contributing

Uh oh!

codecov-commenter commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

suri-kumkaran commented Feb 6, 2026 •

edited

Loading

codecov-commenter commented Feb 6, 2026 •

edited

Loading