babylm-base7.5m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5719
  • Accuracy: 0.1683

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 185
  • training_steps: 18500

Training results

Training Loss Epoch Step Validation Loss Accuracy
6.9447 0.1064 200 6.4872 0.1017
6.1819 0.2128 400 6.1702 0.1302
6.0357 0.3191 600 6.0568 0.1367
5.9344 0.4255 800 5.9913 0.1378
5.8993 0.5319 1000 5.9498 0.1402
5.8548 0.6383 1200 5.9075 0.1464
5.8189 0.7447 1400 5.8862 0.1485
5.7954 0.8511 1600 5.8521 0.1503
5.7616 0.9574 1800 5.8393 0.1523
5.7739 1.0638 2000 5.8244 0.1517
5.6677 2.1277 4000 5.7452 0.1568
5.6097 3.1915 6000 5.6973 0.1600
5.5465 4.2553 8000 5.6681 0.1613
5.5417 5.3191 10000 5.6313 0.1653
5.4604 6.3830 12000 5.6162 0.1664
5.4695 7.4468 14000 5.5993 0.1668
5.4586 8.5106 16000 5.5811 0.1674
5.4549 9.5745 18000 5.5745 0.1679

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.4
Downloads last month
1
Safetensors
Model size
98.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results