jsbeaudry/oswald-large-v3-turbo-m2
๐ฃ๏ธ Haitian Creole (Kreyรฒl Ayisyen) ASR Model
jsbeaudry/oswald-large-v3-turbo-m2 is a fine-tuned Whisper large-v3-turbo model optimized for Haitian Creole (ht) automatic speech recognition.
The model was trained using Unsloth for efficient low-VRAM fine-tuning.
๐ Model Details
- Base model:
openai/whisper-large-v3-turbo - Task: Automatic Speech Recognition (ASR)
- Language: Haitian Creole (
ht) - Framework: ๐ค Transformers
- Fine-tuning method: Parameter-efficient fine-tuning (PEFT-style)
- Trainable parameters: ~13.1M (โ1.59% of total parameters)
- Output format: Text transcription
๐งช Training Summary
| Setting | Value |
|---|---|
| Training samples | 9,246 |
| Epochs | 1 |
| Total steps | 289 |
| Batch size (per device) | 4 |
| Gradient accumulation | 8 |
| Effective batch size | 32 |
| GPUs | 1 |
| Precision | FP16 |
| Optimizer | Unsloth default |
| Checkpointing | Gradient checkpointing enabled |
โ ๏ธ
use_cachewas disabled automatically due to gradient checkpointing.
๐ Evaluation Results
Metric: Word Error Rate (WER)
Final validation WER: ~25.96%
WER Progression (selected steps)
| Step | Validation WER |
|---|---|
| 5 | 81.04 |
| 25 | 48.60 |
| 50 | 37.18 |
| 100 | 31.53 |
| 150 | 28.43 |
| 200 | 26.98 |
| 250 | 26.53 |
| 280 | 25.96 |
The model shows strong convergence within a single epoch, with rapid early gains and stable late-stage improvements.
๐ Usage
Colab (Fast Whisper setup)
https://colab.research.google.com/drive/1_D5KbmhDzRhYHk5xKwP8wEw__IRONWR8?usp=sharing
Transformers (Python)
!pip install transformers
from transformers import pipeline
from IPython.display import Audio, display
asr = pipeline(
"automatic-speech-recognition",
model="jsbeaudry/oswald-large-v3-turbo-m2",
device=0
)
audio_file = "/content/audio_file.wav"
display(Audio(audio_file, rate = 24000))
result = asr(audio_file)
print(result["text"])
Convert for faster_whisper (Python)
!pip install ctranslate2 huggingface_hub faster-whisper
from ctranslate2.converters import TransformersConverter
converter = TransformersConverter(
"jsbeaudry/oswald-large-v3-turbo-m2",
low_cpu_mem_usage=True
)
converter.convert("oswald-large-v3-turbo-m2-ct2", quantization="int8")
Use of faster_whisper (Python)
from faster_whisper import WhisperModel
# Load the model, explicitly defining it as a "large-v3" type
# and pointing to the locally converted CTranslate2 model directory.
# This ensures faster_whisper correctly sets n_mels=128.
model = WhisperModel(
"large-v3", # Specify the model type to ensure correct n_mels (128 for large-v3)
device="cpu",
compute_type="int8",
download_root="/content/oswald-large-v3-turbo-m2-ct2" # Point to the local CT2 model
)
# Transcribe audio
segments, info = model.transcribe(
"/content/000f7c0f-1661-43d4-819d-91b6a8964b1c.wav",
language="ht", # Haitian Creole
beam_size=5,
vad_filter=True, # Recommended for better accuracy
vad_parameters=dict(min_silence_duration_ms=500)
)
print(f"Detected language '{info.language}' with probability {info.language_probability}")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
- Downloads last month
- 36
Model tree for jsbeaudry/oswald-large-v3-turbo-m2
Base model
openai/whisper-large-v3
Finetuned
unsloth/whisper-large-v3-turbo