Support for a4w4 (fp4) in block scale gemm AB quant #3603
+773
−275
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
Support for packed 4-bit floating point for both A and B tensors in block scale gemm. Tested with A using 1D block scale and B using 2D block scale. Works for both the "regular" and Preshuffle-B pipelines. Note that the regular pipeline stores data in fp8 in LDS (as this is how int4 was implemented). The WP pipeline stores tensor A in fp4 in LDS and dequants in when loading to registers.
Changes include:
InterleavedPKTypeLoaderfor generic type conversions instead of just int4TEST_convert_with_table.Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered