Support for a4w4 (fp4) in block scale gemm AB quant #3603

ErwinTerpstra · 2026-01-19T10:57:51Z

Proposed changes

Support for packed 4-bit floating point for both A and B tensors in block scale gemm. Tested with A using 1D block scale and B using 2D block scale. Works for both the "regular" and Preshuffle-B pipelines. Note that the regular pipeline stores data in fp8 in LDS (as this is how int4 was implemented). The WP pipeline stores tensor A in fp4 in LDS and dequants in when loading to registers.

Changes include:

Add fp4 support to ABQuant example, with/without PreshuffleB
Tests for fp4 on both A and B tensors (a4w4) for base case, irregular sizes and preshuffle B pipeline.
Other changes:
- Add support to InterleavedPKTypeLoader for generic type conversions instead of just int4
- Add LUT for converting fp4 to fp8. Improves performance of 4K tensor by around 25% on gfx12. Disabled by default using TEST_convert_with_table.
- Some helper traits to work with packed or mixed precision types. Including a method to determine MFMA type based on input types

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

…speed up compile times

… due to larger mfma tile size

ErwinTerpstra added 9 commits January 14, 2026 08:07

chore: split block scale example instances in more separate files to …

9891903

…speed up compile times

wip: fp4 scaffolding for abquant

e43fd33

feat: add fp4 decoding-while-loading to abquant pipeline

6dea234

feat: add support for fp4 CPU verification in abquant

d0cd610

chore: add time tracking to reference calculation

58088a5

feat: add a4w4 test for blockscale gemm

3d8bfdb

feat: optimize reference calculation by preconverting values to AccType

761ba1b

feat: add fp4 to fp8 look-up table

a477fb8

fix: reference to wrong ComputeDataType field in QuantProblem

72a94bd

ErwinTerpstra added the organization: streamhpc label Jan 19, 2026

ErwinTerpstra added 6 commits January 19, 2026 11:04

Merge branch 'develop' into eterpstr/206-block-scale-gemm-fp4-support

7563031

feat: type utilities for determining MFMA compute types

e76d18e

feat: packed fp4 for abquant weight preshuffle

37af217

feat: add separate tests for a4w4 base case, padding and preshuffleB

32d5757

fix: fp4 conversion on gfx950 attempting to use non-supported method

f55d902

fix: test case was using quant group sizes which don't work on gfx950…

c0d869b

… due to larger mfma tile size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for a4w4 (fp4) in block scale gemm AB quant #3603

Support for a4w4 (fp4) in block scale gemm AB quant #3603

ErwinTerpstra commented Jan 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support for a4w4 (fp4) in block scale gemm AB quant #3603

Are you sure you want to change the base?

Support for a4w4 (fp4) in block scale gemm AB quant #3603

Conversation

ErwinTerpstra commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ErwinTerpstra commented Jan 19, 2026 •

edited

Loading