Supertonic-2 CoreML

This repository provides CoreML exports of Supertonic 2 for macOS and iOS. It focuses on on-device inference with multiple >=8-bit quantization variants.

GitHub repo (code + demo app): https://github.com/Nooder/supertonic-2-coreml

Code & demo

The GitHub repo contains:

Swift demo app (CoreML pipeline + UI): supertonic2-coreml-ios-test/
CoreML tooling + tests: scripts/
Docs: docs/

What is included

models/: CoreML model packages by variant (>=8-bit only)
resources/: voice styles, embeddings, and text normalization assets
manifest.json: list of artifacts with checksums and sizes
SHA256SUMS: sha256 checksums for all files
tests/: smoke tests for CoreML model loading

Quickstart (iOS / macOS)

Pick a variant from models/ (see the quant matrix in docs/quant-matrix.md).
Bundle the corresponding CoreML packages and resources/ into your app.
Use the Swift demo app in the GitHub repo supertonic-2-coreml as the reference implementation.

Required files (checklist)

Bundle the following into your app:

CoreML packages for your chosen variant:
- duration_predictor_mlprogram.mlpackage
- text_encoder_mlprogram.mlpackage
- vector_estimator_mlprogram.mlpackage
- vocoder_mlprogram.mlpackage
resources/voice_styles/
resources/embeddings/
resources/onnx/unicode_indexer.json
resources/onnx/tts.json

Minimal iOS integration

// Example usage (see demo app for full UI + playback)
let service = try TTSService(computeUnits: .all)
let result = try service.synthesize(
    text: "Hello from CoreML!",
    language: .en,
    voiceName: "F1",
    steps: 20,
    speed: 1.0,
    silenceSeconds: 0.3
)
print("WAV file:", result.url)

To select a specific variant, update the CoreML folder name in TTSService (the demo defaults to coreml_int8).

Example: iOS 18 `int8_both`

This variant uses int8 weights for multiple stages on iOS 18.

Bundle these files in your app:

Resources/
  coreml_ios18_int8_both/
    duration_predictor_mlprogram.mlpackage
    text_encoder_mlprogram.mlpackage
    vector_estimator_mlprogram.mlpackage
    vocoder_mlprogram.mlpackage
  voice_styles/
  embeddings/
  onnx/
    unicode_indexer.json
    tts.json

In the Swift demo app, update the CoreML folder name to point at coreml_ios18_int8_both (the app defaults to coreml_int8).

Choosing a variant

Use the folder naming to select the right artifact:

coreml_int8: faster, lower fidelity
coreml_compressed: smaller memory (linear8)
coreml_ios18_*: for iOS 18 CoreML runtime (>=8-bit only)

4-bit variants are intentionally excluded due to quality.

Variant matrix (quick view)

Variant folder	Quantization (by name)	Intended target	Notes
`coreml`	full precision (mixed)	general	baseline quality
`coreml_int8`	int8 (all stages)	general	faster, lower fidelity
`coreml_compressed`	linear8	general	smaller memory
`coreml_ios18`	full precision (mlprogram)	iOS 18+	best quality on iOS 18
`coreml_ios18_int8_vocoder_only`	int8 (vocoder only)	iOS 18+	balanced
`coreml_ios18_int8_both`	int8 (multiple stages)	iOS 18+	fastest, more loss
`coreml_compressed_ios18`	linear8	iOS 18+	smallest memory

For deeper guidance, see docs/compatibility-matrix.md and docs/quant-matrix.md.

Steps vs. quality (quick guide)

Steps	Speed	Quality
10	fastest	lowest
20	balanced	good
30	slowest	best

Troubleshooting

Missing resource error: Ensure resources/ folders are bundled and named exactly.
Model not found: Confirm the CoreML folder name (e.g., coreml_ios18_int8_both).
Fails to load on device: Check iOS deployment target matches your variant.

Tests

The tests/test_coreml_models.py script runs a simple smoke test that loads all stages (duration predictor, text encoder, vector estimator, vocoder) with dummy inputs.

Attribution and license

This CoreML export is derived from Supertone/supertonic-2. Model weights are licensed under OpenRAIL-M (see LICENSE). Sample code is MIT-licensed (see NOTICE and UPSTREAM.md).

Downloads last month: 66