Parakeet TDT 0.6B v3 (MLX, Encoder INT4)

NVIDIA Parakeet TDT v3 with encoder-only INT4 quantization โ€” the lite variant for memory-constrained devices (8 GB Macs). 61% smaller than BF16 with only +0.4pp WER on real-world speech.

See also:

Performance

Metric BF16 INT4 Change
WER (LibriSpeech) 0.82% 0.82% None
WER (TED-LIUM) 15.1% 15.5% +0.4pp
RTFx 73x 98x +34%
Peak Memory 3,002 MB 1,003 MB -67%
Weight Size 1,254 MB 489 MB -61%

Benchmarked on Apple M3 Max (64 GB), macOS Sequoia 15.7.3, MLX 0.30.4.

The small WER increase on TED-LIUM (+0.4pp) appears on speakers with challenging acoustics. LibriSpeech (clean speech) shows no degradation.

Recommendation: Use INT8 for general use. Use INT4 when memory is the constraint (8 GB unified memory Macs).

Quantization Details

Only the Conformer encoder (~85% of parameters) is quantized to INT4 (group_size=64). The decoder and joint network remain in BF16, preserving precision for token generation.

Usage

from parakeet import from_pretrained
import mlx.core as mx

model = from_pretrained("sonic-speech/parakeet-tdt-0.6b-v3-int4", dtype=mx.bfloat16)
result = model.transcribe("audio.wav")

Install: pip install parakeet-mlx

Origin

Quantized from sonic-speech/parakeet-tdt-0.6b-v3 using mlx.nn.quantize (encoder-only, 4-bit, group_size=64).

Part of the Sonic Speech model collection.

Downloads last month
30
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sonic-speech/parakeet-tdt-0.6b-v3-int4

Finetuned
(2)
this model