⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730

Draft
suri-kumkaran wants to merge 6 commits intomainfrom
users/suryangupta/multivector-bench
Draft

Experimental Multi-Vector Chamfer Distance with SIMD & BLAS Optimizations#730
suri-kumkaran wants to merge 6 commits intomainfrom
users/suryangupta/multivector-bench

Conversation

@suri-kumkaran
Copy link
Contributor

@suri-kumkaran suri-kumkaran commented Feb 6, 2026

Summary

Experimental multi-vector support with fast Chamfer distance for f32 embeddings and benchmarking infrastructure.

Changes

Core Types

  • MultiVector - Row-major token embeddings
  • TransposedMultiVector - Block-transposed layout (16 vectors/block, SIMD-optimized)
  • Chamfer<Approach> - Generic distance using Inner Product (implements DistanceFunction)

Implementations

  • NaiveApproach - Scalar baseline
  • SimdApproach - SIMD via diskann_vector::InnerProduct
  • TransposedApproach - Block-transposed SIMD
  • TransposedWithTilingApproach - Query tiling (transposes docs, processes query pairs)
  • QueryTransposedWithTilingApproach - Doc tiling (transposes query, processes doc pairs)
  • SgemmApproach - BLAS SGEMM + SIMD row-max (via faer library)

Benchmark Results (100 points, 10 iterations)

Machine: Intel Core i7-1365U, AVX2 supported, AVX-512 not supported, 32 GB RAM
Note: Times are median over 50 measurements, each measuring 10 consecutive distance computations across 100 points.

Speedup vs SIMD Baseline (Median, Lower Latency = Better)

Configuration SIMD (µs) transposed_simd transposed_tiling query_transposed_tiling sgemm
dim=128, doc=32, query=8 2,237 1.34x 1.75x 1.05x 1.13x
dim=128, doc=64, query=16 9,224 1.42x 2.07x 2.35x 1.48x
dim=128, doc=128, query=32 47,882 1.32x 1.86x 2.64x 1.88x
dim=256, doc=32, query=8 4,654 1.26x 1.69x 1.13x 0.96x
dim=256, doc=64, query=16 25,809 1.56x 1.94x 2.40x 1.87x
dim=256, doc=128, query=32 101,093 1.41x 1.71x 2.67x 1.96x
dim=256, doc=16, query=32 8,239 1.22x 1.77x 2.02x 1.57x
dim=384, doc=32, query=8 8,412 1.41x 1.65x 1.30x 1.24x
dim=384, doc=64, query=16 38,162 1.30x 1.47x 1.70x 1.66x
dim=384, doc=128, query=32 171,431 1.53x 1.94x 2.04x 2.16x
Average 1.38x 1.79x 1.93x 1.59x

Dimension 256

Approach Q×D Mean (µs) Speedup
SIMD 8×32 3328 1.0x
Tiling 8×32 2207 1.5x
QueryTiling 8×32 3243 1.0x
SGEMM 8×32 2675 1.2x
SIMD 16×64 18464 1.0x
Tiling 16×64 10458 1.8x
QueryTiling 16×64 6326 2.9x
SGEMM 16×64 7014 2.6x
SIMD 32×128 72197 1.0x
Tiling 32×128 39453 1.8x
QueryTiling 32×128 25357 2.8x
SGEMM 32×128 19727 3.7x

Dimension 384

Approach Q×D Mean (µs) Speedup
SIMD 8×32 7026 1.0x
Tiling 8×32 3609 1.9x
QueryTiling 8×32 4816 1.5x
SGEMM 8×32 4077 1.7x
SIMD 16×64 27518 1.0x
QueryTiling 16×64 9750 2.8x
SGEMM 16×64 10070 2.7x
SIMD 32×128 107772 1.0x
Tiling 32×128 50072 2.2x
QueryTiling 32×128 50669 2.1x
SGEMM 32×128 28764 3.7x

Future Work

  • Add RFC based on findings for DiskANN integration
  • Support for additional element types (f16, u8 quantized, etc.)
  • AVX-512 support: Larger registers could enable tile size 4 (processing 4 queries simultaneously)
  • SIMD-accelerated horizontal MinMax: Hardware-accelerated horizontal min/max reductions across SIMD lanes for faster per-query score aggregation

Testing

cargo build --release -p multi-vector
cargo run --release -p multi-vector --bin multivec-bench -- run \
    --input-file multi-vector/examples/bench.json --output-file results.json

Contributing

This work is experimental and will be submitted as separate PRs.

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 68.86326% with 378 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.69%. Comparing base (354df71) to head (63f07fa).

Files with missing lines Patch % Lines
...ctor-bench/src/distance/query_transposed_tiling.rs 49.53% 163 Missing ⚠️
...lti-vector-bench/src/distance/transposed_tiling.rs 43.50% 113 Missing ⚠️
experimental-multi-vector-bench/src/bench/mod.rs 70.88% 23 Missing ⚠️
...ntal-multi-vector-bench/src/distance/transposed.rs 75.00% 23 Missing ⚠️
diskann-quantization/src/multi_vector/matrix.rs 39.39% 20 Missing ⚠️
experimental-multi-vector-bench/src/bench/input.rs 68.00% 16 Missing ⚠️
...xperimental-multi-vector-bench/src/bench/runner.rs 95.37% 10 Missing ⚠️
...ental-multi-vector-bench/src/bin/multivec_bench.rs 87.87% 4 Missing ⚠️
...erimental-multi-vector-bench/src/distance/sgemm.rs 97.79% 3 Missing ⚠️
...xperimental-multi-vector-bench/src/multi_vector.rs 87.50% 3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (68.86%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #730      +/-   ##
==========================================
- Coverage   89.00%   88.69%   -0.31%     
==========================================
  Files         428      440      +12     
  Lines       78294    79505    +1211     
==========================================
+ Hits        69686    70519     +833     
- Misses       8608     8986     +378     
Flag Coverage Δ
miri 88.69% <68.86%> (-0.31%) ⬇️
unittests 88.69% <68.86%> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...xperimental-multi-vector-bench/src/distance/mod.rs 100.00% <100.00%> (ø)
...erimental-multi-vector-bench/src/distance/naive.rs 100.00% <100.00%> (ø)
...perimental-multi-vector-bench/src/distance/simd.rs 100.00% <100.00%> (ø)
...erimental-multi-vector-bench/src/distance/sgemm.rs 97.79% <97.79%> (ø)
...xperimental-multi-vector-bench/src/multi_vector.rs 87.50% <87.50%> (ø)
...ental-multi-vector-bench/src/bin/multivec_bench.rs 87.87% <87.87%> (ø)
...xperimental-multi-vector-bench/src/bench/runner.rs 95.37% <95.37%> (ø)
experimental-multi-vector-bench/src/bench/input.rs 68.00% <68.00%> (ø)
diskann-quantization/src/multi_vector/matrix.rs 94.74% <39.39%> (-3.41%) ⬇️
experimental-multi-vector-bench/src/bench/mod.rs 70.88% <70.88%> (ø)
... and 3 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants