Fp qmv #2984

awni · 2026-01-11T17:00:40Z

Adds a basic qmv kernel for fp quants for CUDA.
Adds a simple quantize-dequantize kernel for CUDA
Routes the qqmv to the quantize-dequantize + qmv as it's much faster than using cublas with scale swizzling

awni added 7 commits January 10, 2026 15:33

allow some non 2D inputs in qqmm

7242970

add very basic fp qmv

f0e31e9

working for batched

865a5f9

use uint32

7490d9b

route qqmv to qmv with qauntize-dequantize kernel

b8ff6ee

cleanup

195acec

fix older cuda

f527572

awni force-pushed the fp_qmv branch from 7f97c39 to f527572 Compare January 12, 2026 19:58

Provide feedback