Ck tile/gemm blockscale abq opt #3614

kensclin · 2026-01-20T02:26:19Z

Proposed changes

Improve abquant performance

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

* Rename member variable to better reflect its actuall meaning. * Add transfer checks for conv fwd xdl. * Validate tensor layouts & vector size conv fwd v3. * Add combined transfer concepts. * Add transfer concepts for conv fwd factories. * Fix clang format * Add helper instruction to get max mem vector instruction width. * Apply review comments. * Rename thread cluster access(->arrange) order concept * FIx merge artifacts. * Add generic access order limits into block transfer concept.

…inx (#3602) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.31.2 to 1.31.3. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.31.2...v1.31.3) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.31.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

This change improves the clang-format CI check to be faster and not depend on git being available in the build environment. Changes: - Use `find` instead of `git ls-files` (no git dependency) - Check all C++ files: *.h, *.hpp, *.cpp, *.h.in, *.hpp.in, *.cpp.in, *.cl - Exclude build/ and include/rapidjson directories - Use parallel processing with 8 cores (-P 8) for ~8x speedup - Show only errors with unified diff format (-u) - Clear error messages: "ERROR: <file> needs formatting" - Preserve original logic: run clang-format only when RUN_CPPCHECK=false, or run both clang-format and cppcheck when RUN_CPPCHECK=true Performance: - Sequential processing: ~93 seconds for 5,899 files - Parallel with 8 cores: ~12 seconds for 5,899 files - Per-file processing time: ~15ms This reduces CI time while maintaining code formatting standards.

* add new tile size for async Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> * Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix lse error Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> --------- Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [CK TILE] remove dependency on std chrono * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [BULDER] Add grouped conv fwd ck tile profiler * [CK TILE] Fix grouped conv kernels splitk and double lds * Updates * Fixes * Move to ckProfiler * Fixes * fix * fix * Change instances to empty list by default * fix * fix * Update grouped_convolution_signatures.hpp * Update grouped_convolution_forward_tile_algs.hpp * [CK TILE] Add grouped convolution forward tests (#3556) * [CK TILE] Add grouped convolution forward tests * fix jenkins * fixes * comments fixes * unit test * unit test fix * Move instances outside builder * fix includes * clang format fix * readme fix * fix includes * fixes

Summary: - added new device impl of Batched GEMM Reduce for WMMA - added instance library - added WMMA impl to the Batched GEMM Reduce tests

kensclin added 2 commits January 19, 2026 09:37

GEMM Blockscale ABQuant Optimization

84ed724

IGLP adjust

05da835

kensclin requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 20, 2026 02:26

aosewski and others added 8 commits January 20, 2026 14:53

[CK TILE] remove dependency on std chrono (#3599)

88e7029

* [CK TILE] remove dependency on std chrono * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

WMMA support for batched_gemm_reduce (#3332)

9617df6

Summary: - added new device impl of Batched GEMM Reduce for WMMA - added instance library - added WMMA impl to the Batched GEMM Reduce tests

Fixed compile error

8a29753

kensclin requested review from a team and ddembeckAMD as code owners January 20, 2026 14:53

kensclin closed this Jan 20, 2026

kensclin deleted the ck_tile/gemm_blockscale_abq_opt branch January 20, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ck tile/gemm blockscale abq opt #3614

Ck tile/gemm blockscale abq opt #3614

Uh oh!

kensclin commented Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Ck tile/gemm blockscale abq opt #3614

Ck tile/gemm blockscale abq opt #3614

Uh oh!

Conversation

kensclin commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

kensclin commented Jan 20, 2026 •

edited

Loading