babylm-base1m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5989
  • Accuracy: 0.1577

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 198
  • training_steps: 19800

Training results

Training Loss Epoch Step Validation Loss Accuracy
7.0294 0.1066 200 6.5917 0.0954
6.1531 0.2132 400 6.2447 0.1254
5.9932 0.3198 600 6.1193 0.1324
5.8652 0.4264 800 6.0640 0.1358
5.7992 0.5330 1000 6.0454 0.1388
5.7234 0.6397 1200 5.9918 0.1406
5.6744 0.7463 1400 5.9741 0.1415
5.6754 0.8529 1600 5.9548 0.1433
5.648 0.9595 1800 5.9283 0.1452
5.5538 1.0661 2000 5.9202 0.1413
5.3874 2.1322 4000 5.8212 0.1512
5.2445 3.1983 6000 5.7604 0.1514
5.0922 4.2644 8000 5.6927 0.1559
4.9931 5.3305 10000 5.6602 0.1555
4.9755 6.3966 12000 5.6355 0.1567
4.9476 7.4627 14000 5.6257 0.1565
4.8731 8.5288 16000 5.6165 0.1563
4.8685 9.5949 18000 5.6023 0.1574

Framework versions

  • Transformers 4.50.3
  • Pytorch 2.7.1+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.4
Downloads last month
4
Safetensors
Model size
98.6M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results