You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

jsbeaudry/oswald-large-v3-turbo-m2

๐Ÿ—ฃ๏ธ Haitian Creole (Kreyรฒl Ayisyen) ASR Model

jsbeaudry/oswald-large-v3-turbo-m2 is a fine-tuned Whisper large-v3-turbo model optimized for Haitian Creole (ht) automatic speech recognition.
The model was trained using Unsloth for efficient low-VRAM fine-tuning.


๐Ÿ” Model Details

  • Base model: openai/whisper-large-v3-turbo
  • Task: Automatic Speech Recognition (ASR)
  • Language: Haitian Creole (ht)
  • Framework: ๐Ÿค— Transformers
  • Fine-tuning method: Parameter-efficient fine-tuning (PEFT-style)
  • Trainable parameters: ~13.1M (โ‰ˆ1.59% of total parameters)
  • Output format: Text transcription

๐Ÿงช Training Summary

Setting Value
Training samples 9,246
Epochs 1
Total steps 289
Batch size (per device) 4
Gradient accumulation 8
Effective batch size 32
GPUs 1
Precision FP16
Optimizer Unsloth default
Checkpointing Gradient checkpointing enabled

โš ๏ธ use_cache was disabled automatically due to gradient checkpointing.


๐Ÿ“Š Evaluation Results

Metric: Word Error Rate (WER)
Final validation WER: ~25.96%

WER Progression (selected steps)

Step Validation WER
5 81.04
25 48.60
50 37.18
100 31.53
150 28.43
200 26.98
250 26.53
280 25.96

The model shows strong convergence within a single epoch, with rapid early gains and stable late-stage improvements.


๐Ÿš€ Usage

Colab (Fast Whisper setup)

https://colab.research.google.com/drive/1_D5KbmhDzRhYHk5xKwP8wEw__IRONWR8?usp=sharing

Transformers (Python)

!pip install transformers
from transformers import pipeline
from IPython.display import Audio, display


asr = pipeline(
    "automatic-speech-recognition",
    model="jsbeaudry/oswald-large-v3-turbo-m2",
    device=0
)


audio_file = "/content/audio_file.wav"
display(Audio(audio_file, rate = 24000))


result = asr(audio_file)
print(result["text"])

Convert for faster_whisper (Python)

!pip install ctranslate2 huggingface_hub faster-whisper
from ctranslate2.converters import TransformersConverter

converter = TransformersConverter(
    "jsbeaudry/oswald-large-v3-turbo-m2",
    low_cpu_mem_usage=True
)
converter.convert("oswald-large-v3-turbo-m2-ct2", quantization="int8")

Use of faster_whisper (Python)

from faster_whisper import WhisperModel

# Load the model, explicitly defining it as a "large-v3" type
# and pointing to the locally converted CTranslate2 model directory.
# This ensures faster_whisper correctly sets n_mels=128.
model = WhisperModel(
    "large-v3", # Specify the model type to ensure correct n_mels (128 for large-v3)
    device="cpu",
    compute_type="int8",
    download_root="/content/oswald-large-v3-turbo-m2-ct2" # Point to the local CT2 model
)

# Transcribe audio
segments, info = model.transcribe(
    "/content/000f7c0f-1661-43d4-819d-91b6a8964b1c.wav",
    language="ht",  # Haitian Creole
    beam_size=5,
    vad_filter=True,  # Recommended for better accuracy
    vad_parameters=dict(min_silence_duration_ms=500)
)

print(f"Detected language '{info.language}' with probability {info.language_probability}")

for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
Downloads last month
36
Safetensors
Model size
0.8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jsbeaudry/oswald-large-v3-turbo-m2

Finetuned
(68)
this model