⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@awni
Copy link
Member

@awni awni commented Jan 11, 2026

  • Adds a basic qmv kernel for fp quants for CUDA.
  • Adds a simple quantize-dequantize kernel for CUDA
  • Routes the qqmv to the quantize-dequantize + qmv as it's much faster than using cublas with scale swizzling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant