openslr/librispeech_asr
Viewer • Updated • 585k • 103k • 221
How to use speech-seq2seq/wav2vec2-2-bert-large with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="speech-seq2seq/wav2vec2-2-bert-large") # Load model directly
from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq
tokenizer = AutoTokenizer.from_pretrained("speech-seq2seq/wav2vec2-2-bert-large")
model = AutoModelForSpeechSeq2Seq.from_pretrained("speech-seq2seq/wav2vec2-2-bert-large")YAML Metadata Error:"model-index[0].name" is not allowed to be empty
This model was trained from scratch on the librispeech_asr dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 6.7599 | 0.28 | 500 | 6.8755 | 1.2551 |
| 6.5943 | 0.56 | 1000 | 6.7702 | 1.5878 |
| 6.3146 | 0.84 | 1500 | 6.6981 | 1.6627 |
| 6.6112 | 1.12 | 2000 | 6.6760 | 1.9853 |
| 6.6894 | 1.4 | 2500 | 6.6323 | 1.9376 |
| 6.5525 | 1.68 | 3000 | 6.6185 | 1.9383 |
| 6.571 | 1.96 | 3500 | 6.6126 | 1.9580 |
| 6.3363 | 2.24 | 4000 | 6.7869 | 1.9818 |
| 6.5832 | 2.52 | 4500 | 6.9096 | 2.0025 |
| 6.3523 | 2.8 | 5000 | 6.9670 | 1.9878 |