You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

jsbeaudry/oswald-large-v3-turbo-m2

🗣️ Haitian Creole (Kreyòl Ayisyen) ASR Model

jsbeaudry/oswald-large-v3-turbo-m2 is a fine-tuned Whisper large-v3-turbo model optimized for Haitian Creole (ht) automatic speech recognition.
The model was trained using Unsloth for efficient low-VRAM fine-tuning.

🔍 Model Details

Base model: openai/whisper-large-v3-turbo
Task: Automatic Speech Recognition (ASR)
Language: Haitian Creole (ht)
Framework: 🤗 Transformers
Fine-tuning method: Parameter-efficient fine-tuning (PEFT-style)
Trainable parameters: ~13.1M (≈1.59% of total parameters)
Output format: Text transcription

🧪 Training Summary

Setting	Value
Training samples	9,246
Epochs	1
Total steps	289
Batch size (per device)	4
Gradient accumulation	8
Effective batch size	32
GPUs	1
Precision	FP16
Optimizer	Unsloth default
Checkpointing	Gradient checkpointing enabled

⚠️ use_cache was disabled automatically due to gradient checkpointing.

📊 Evaluation Results

Metric: Word Error Rate (WER)
Final validation WER: ~25.96%

WER Progression (selected steps)

Step	Validation WER
5	81.04
25	48.60
50	37.18
100	31.53
150	28.43
200	26.98
250	26.53
280	25.96

The model shows strong convergence within a single epoch, with rapid early gains and stable late-stage improvements.

🚀 Usage

Colab (Fast Whisper setup)

https://colab.research.google.com/drive/1_D5KbmhDzRhYHk5xKwP8wEw__IRONWR8?usp=sharing

Transformers (Python)

!pip install transformers

from transformers import pipeline
from IPython.display import Audio, display


asr = pipeline(
    "automatic-speech-recognition",
    model="jsbeaudry/oswald-large-v3-turbo-m2",
    device=0
)


audio_file = "/content/audio_file.wav"
display(Audio(audio_file, rate = 24000))


result = asr(audio_file)
print(result["text"])

Convert for faster_whisper (Python)

!pip install ctranslate2 huggingface_hub faster-whisper

from ctranslate2.converters import TransformersConverter

converter = TransformersConverter(
    "jsbeaudry/oswald-large-v3-turbo-m2",
    low_cpu_mem_usage=True
)
converter.convert("oswald-large-v3-turbo-m2-ct2", quantization="int8")

Use of faster_whisper (Python)

from faster_whisper import WhisperModel

# Load the model, explicitly defining it as a "large-v3" type
# and pointing to the locally converted CTranslate2 model directory.
# This ensures faster_whisper correctly sets n_mels=128.
model = WhisperModel(
    "large-v3", # Specify the model type to ensure correct n_mels (128 for large-v3)
    device="cpu",
    compute_type="int8",
    download_root="/content/oswald-large-v3-turbo-m2-ct2" # Point to the local CT2 model
)

# Transcribe audio
segments, info = model.transcribe(
    "/content/000f7c0f-1661-43d4-819d-91b6a8964b1c.wav",
    language="ht",  # Haitian Creole
    beam_size=5,
    vad_filter=True,  # Recommended for better accuracy
    vad_parameters=dict(min_silence_duration_ms=500)
)

print(f"Detected language '{info.language}' with probability {info.language_probability}")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")

Downloads last month: 36

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for jsbeaudry/oswald-large-v3-turbo-m2

Base model

openai/whisper-large-v3

Finetuned

unsloth/whisper-large-v3-turbo

Finetuned

(68)

this model