⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

[CK-Tile] Improve cshuffle epilogue mfma tile coverage#3701

Closed
tenpercent wants to merge 1 commit intodevelopfrom
cshuffle-epilogue-tests
Closed

[CK-Tile] Improve cshuffle epilogue mfma tile coverage#3701
tenpercent wants to merge 1 commit intodevelopfrom
cshuffle-epilogue-tests

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Feb 2, 2026

Summary

This PR refactors and expands the CShuffleEpilogue test suite with comprehensive coverage and improved verification.

Changes

1. Sharded tests by data type with gfx950-specific variants

  • Split test_cshuffle_epilogue.cpp into:
    • test_cshuffle_epilogue_fp16.cpp - FP16/BF16 data type tests
    • test_cshuffle_epilogue_fp8.cpp - FP8 data type tests
    • test_cshuffle_epilogue_fp8_gfx950.cpp - gfx950-specific FP8 tests
    • test_cshuffle_epilogue_scale.cpp - Scale tests as a separate target
  • Created test_cshuffle_epilogue_common.hpp to share common test infrastructure
  • Enables more targeted testing and better organization by data type

2. Improved verification with robust checks

  • Implemented set-based comparison to ensure all expected values are present
  • Added distribution uniformity checks to verify proper data shuffling
  • Strengthened verification to require kBlockSize unique values
  • Validate all rows independently for comprehensive coverage

3. Expanded test coverage with parameterized configurations

  • Added comprehensive test cases covering various warp layouts and MFMA types
  • Introduced parameterized testing for different reduction operations, output types, and scaling modes
  • Decoupled MFMA type from output type for more flexible test configurations

4. Code quality improvements

  • Fixed memory leaks in test setup/teardown
  • Improved code hygiene and separation of concerns
  • Extracted common utilities to shared headers
  • Better code organization and maintainability

5. CMake support for OCP FP8 compilation flag

  • Updated test/ck_tile/epilogue/CMakeLists.txt to conditionally add -DCK_TILE_USE_OCP_FP8 compile option when CK_USE_OCP_FP8 is enabled
  • Follows the established pattern used in other test directories
  • Added conditional gfx950 target build

Test plan

  • Built all test executables successfully (test_ck_tile_cshuffle_epilogue_fp16, test_ck_tile_cshuffle_epilogue_fp8, test_ck_tile_cshuffle_epilogue_fp8_gfx950, test_ck_tile_cshuffle_epilogue_scale)
  • Verified CMake configuration follows codebase conventions
  • Ready for CI validation

…and improved verification

This commit refactors and expands the CShuffleEpilogue test suite with the following improvements:

- Shard tests by data type (FP16, FP8) and add gfx950-specific FP8 tests
- Extract scale tests into a separate target for better organization
- Implement robust verification using set-based comparison to ensure all expected values are present
- Add distribution uniformity checks to verify proper data shuffling
- Strengthen verification to require kBlockSize unique values and validate all rows independently
- Add support for parameterized test configurations covering various warp layouts and MFMA types
- Improve code organization by separating concerns and extracting common utilities
- Fix memory leaks and improve code hygiene
- Add CMake support for OCP FP8 compilation flag

The refactored test suite provides better coverage, more rigorous verification, and improved
maintainability for the CShuffleEpilogue functionality.

Co-Authored-By: Claude <noreply@anthropic.com>
@tenpercent tenpercent force-pushed the cshuffle-epilogue-tests branch from ca46fb3 to 891efbb Compare February 3, 2026 21:58
@ammallya
Copy link
Contributor

ammallya commented Feb 3, 2026

Imported to ROCm/rocm-libraries

@ammallya ammallya closed this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants