-
Notifications
You must be signed in to change notification settings - Fork 830
Metal backend: Add Metal int4 quantization support to Parakeet #17235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR needs a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements 4-bit weight quantization support for the Parakeet TDT model on the Metal backend using torchao's MPS API. The changes enable Metal-specific quantization while maintaining existing CUDA quantization workflows.
Changes:
- Added
fpa4w(floating point activation, 4-bit weight) quantization option for Metal backend - Implemented validation to ensure Metal-specific quantization is only used with Metal backend
- Updated CI workflows to test Metal int4 quantization with parakeet-tdt model
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| third-party/ao | Updated torchao submodule to version with Metal quantization support |
| examples/models/parakeet/quantize.py | Added Metal int4 quantization implementation using UIntxWeightOnlyConfig |
| examples/models/parakeet/export_parakeet_tdt.py | Added fpa4w option and validation for Metal backend requirement |
| examples/models/parakeet/README.md | Updated documentation with fpa4w config and Metal quantization example |
| .github/workflows/metal.yml | Added int4 quantization testing for parakeet-tdt model |
| .ci/scripts/export_model_artifact.sh | Added quantized-int4-metal option with backend validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization? |
yeah, that's true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| config = UIntxWeightOnlyConfig( | ||
| group_size=qlinear_group_size, | ||
| bitwidth=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the pin past pytorch/ao#3829, and set
uintx_choose_qparams_algorithm="hqq"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done in a follow-up PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's my plan, to do in follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here #17258
This PR adds support for 4-bit weight quantization on the Metal backend for Parakeet TDT model.
Parakeet Export Script (export_parakeet_tdt.py, quantize.py)
Documentation (README.md)
CI Integration (export_model_artifact.sh, metal.yml)
Dependencies