Metal backend: Add Metal int4 quantization support to Parakeet #17235

manuelcandales · 2026-02-05T04:18:29Z

This PR adds support for 4-bit weight quantization on the Metal backend for Parakeet TDT model.

Parakeet Export Script (export_parakeet_tdt.py, quantize.py)

Added fpa4w (floating point activation, 4-bit weight) quantization option for encoder and decoder linear layers
Implemented Metal-specific quantization path using torchao's MPS API (UIntxWeightOnlyConfig)
Added validation to ensure fpa4w is only used with Metal backend
Filters out incompatible layers (weights not divisible by 8) during quantization

Documentation (README.md)

Added fpa4w to quantization config table with Metal backend designation
Added example showing Metal 4-bit quantization usage
Reorganized examples to separate CUDA and Metal quantization workflows

CI Integration (export_model_artifact.sh, metal.yml)

Added quantized-int4-metal option to export script with proper backend validation
Updated Metal CI workflow to test int4 quantization specifically with parakeet-tdt model

Dependencies

Bumped torchao pin for latest Metal quantization support

[ghstack-poisoned]

examples/models/parakeet/quantize.py

examples/models/parakeet/README.md

[ghstack-poisoned]

github-actions · 2026-02-05T18:36:17Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

Copilot

Pull request overview

This PR implements 4-bit weight quantization support for the Parakeet TDT model on the Metal backend using torchao's MPS API. The changes enable Metal-specific quantization while maintaining existing CUDA quantization workflows.

Changes:

Added fpa4w (floating point activation, 4-bit weight) quantization option for Metal backend
Implemented validation to ensure Metal-specific quantization is only used with Metal backend
Updated CI workflows to test Metal int4 quantization with parakeet-tdt model

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
third-party/ao	Updated torchao submodule to version with Metal quantization support
examples/models/parakeet/quantize.py	Added Metal int4 quantization implementation using UIntxWeightOnlyConfig
examples/models/parakeet/export_parakeet_tdt.py	Added fpa4w option and validation for Metal backend requirement
examples/models/parakeet/README.md	Updated documentation with fpa4w config and Metal quantization example
.github/workflows/metal.yml	Added int4 quantization testing for parakeet-tdt model
.ci/scripts/export_model_artifact.sh	Added quantized-int4-metal option with backend validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/models/parakeet/quantize.py

mergennachin · 2026-02-05T18:56:43Z

@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization?

manuelcandales · 2026-02-05T18:59:38Z

@manuelcandales In the README.md, do you wanna add to run "EXECUTORCH_BUILD_KERNELS_TORCHAO=1 TORCHAO_BUILD_EXPERIMENTAL_MPS=1 ./install_executorch.sh" as a prequisite step for int4 metal quantization?

yeah, that's true

[ghstack-poisoned]

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/models/parakeet/quantize.py

.github/workflows/metal.yml

.ci/scripts/export_model_artifact.sh

[ghstack-poisoned]

mergennachin · 2026-02-05T21:24:57Z

examples/models/parakeet/quantize.py

+
+        config = UIntxWeightOnlyConfig(
+            group_size=qlinear_group_size,
+            bitwidth=4,


Update the pin past pytorch/ao#3829, and set

uintx_choose_qparams_algorithm="hqq"

Could be done in a follow-up PR too

yes, that's my plan, to do in follow-up PR

here #17258

manuelcandales added 30 commits January 30, 2026 19:25

Update

39db621

[ghstack-poisoned]

Update

0ed7c5c

[ghstack-poisoned]

Update

b4310cc

[ghstack-poisoned]

Update

94c823c

[ghstack-poisoned]

Update

31b6f45

[ghstack-poisoned]

Update

c68cc6b

[ghstack-poisoned]

Update

bd7192f

[ghstack-poisoned]

Update

bcc8bda

[ghstack-poisoned]

Update

f166c50

[ghstack-poisoned]

Update

0834659

[ghstack-poisoned]

Update

ed4dcee

[ghstack-poisoned]

Update

a058197

[ghstack-poisoned]

Update

7146282

[ghstack-poisoned]

Update

d3501af

[ghstack-poisoned]

Update

fe5be37

[ghstack-poisoned]

Update

a0e3469

[ghstack-poisoned]

Update

fcfa832

[ghstack-poisoned]

Update

2e50286

[ghstack-poisoned]

Update

0145613

[ghstack-poisoned]

Update

2e3254a

[ghstack-poisoned]

Update

c5a3c1a

[ghstack-poisoned]

Update

457428b

[ghstack-poisoned]

Update

fec15bc

[ghstack-poisoned]

Update

40ec415

[ghstack-poisoned]

Update

c16dc59

[ghstack-poisoned]

Update

8ee7d60

[ghstack-poisoned]

Update

9966d37

[ghstack-poisoned]

Update

646b4b3

[ghstack-poisoned]

Update

3483dbf

[ghstack-poisoned]

Update

310b1b6

[ghstack-poisoned]

mergennachin approved these changes Feb 5, 2026

View reviewed changes

examples/models/parakeet/quantize.py Outdated Show resolved Hide resolved

examples/models/parakeet/quantize.py Outdated Show resolved Hide resolved

examples/models/parakeet/README.md Outdated Show resolved Hide resolved

Base automatically changed from gh/manuelcandales/159/head to main February 5, 2026 18:12

manuelcandales requested review from cccclai and shoumikhin as code owners February 5, 2026 18:12

Update

0f2cddd

[ghstack-poisoned]

Copilot AI review requested due to automatic review settings February 5, 2026 18:35

Update

8ff273f

[ghstack-poisoned]

Copilot AI reviewed Feb 5, 2026

View reviewed changes

examples/models/parakeet/quantize.py Show resolved Hide resolved

examples/models/parakeet/quantize.py Show resolved Hide resolved

mergennachin approved these changes Feb 5, 2026

View reviewed changes

manuelcandales added 3 commits February 5, 2026 14:09

Update

4316164

[ghstack-poisoned]

Update

401af46

[ghstack-poisoned]

Update

957ba1f

[ghstack-poisoned]

Copilot AI review requested due to automatic review settings February 5, 2026 19:45

manuelcandales mentioned this pull request Feb 5, 2026

Metal backend: fix linear_filter in test_modules #17253

Merged

manuelcandales changed the base branch from main to gh/manuelcandales/166/head February 5, 2026 19:45

Copilot AI reviewed Feb 5, 2026

View reviewed changes

examples/models/parakeet/quantize.py Show resolved Hide resolved

.github/workflows/metal.yml Show resolved Hide resolved

.ci/scripts/export_model_artifact.sh Show resolved Hide resolved

.ci/scripts/export_model_artifact.sh Show resolved Hide resolved

manuelcandales added 2 commits February 5, 2026 15:05

Update

87f1529

[ghstack-poisoned]

Update

9ea88a9

[ghstack-poisoned]

manuelcandales had a problem deploying to upload-benchmark-results February 5, 2026 20:05 — with GitHub Actions Failure

manuelcandales added 2 commits February 5, 2026 15:15

Update

cf89a2b

[ghstack-poisoned]

Update

56f91d6

[ghstack-poisoned]

Base automatically changed from gh/manuelcandales/166/head to main February 5, 2026 20:28

Update

4962722

[ghstack-poisoned]

mergennachin reviewed Feb 5, 2026

View reviewed changes

manuelcandales temporarily deployed to upload-benchmark-results February 5, 2026 21:50 — with GitHub Actions Inactive

manuelcandales merged commit a8a5f6d into main Feb 5, 2026
332 of 340 checks passed

manuelcandales deleted the gh/manuelcandales/163/head branch February 5, 2026 22:31

Metal backend: Add Metal int4 quantization support to Parakeet #17235

Metal backend: Add Metal int4 quantization support to Parakeet #17235

Uh oh!

Conversation

manuelcandales commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mergennachin commented Feb 5, 2026

Uh oh!

manuelcandales commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergennachin Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergennachin Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

manuelcandales Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

manuelcandales Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manuelcandales commented Feb 5, 2026 •

edited

Loading

This PR needs a `release notes:` label

mergennachin Feb 5, 2026 •

edited

Loading