babylm-base7.5m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 185
training_steps: 18500

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.9447	0.1064	200	6.4872	0.1017
6.1819	0.2128	400	6.1702	0.1302
6.0357	0.3191	600	6.0568	0.1367
5.9344	0.4255	800	5.9913	0.1378
5.8993	0.5319	1000	5.9498	0.1402
5.8548	0.6383	1200	5.9075	0.1464
5.8189	0.7447	1400	5.8862	0.1485
5.7954	0.8511	1600	5.8521	0.1503
5.7616	0.9574	1800	5.8393	0.1523
5.7739	1.0638	2000	5.8244	0.1517
5.6677	2.1277	4000	5.7452	0.1568
5.6097	3.1915	6000	5.6973	0.1600
5.5465	4.2553	8000	5.6681	0.1613
5.5417	5.3191	10000	5.6313	0.1653
5.4604	6.3830	12000	5.6162	0.1664
5.4695	7.4468	14000	5.5993	0.1668
5.4586	8.5106	16000	5.5811	0.1674
5.4549	9.5745	18000	5.5745	0.1679

Safetensors

Model size

98.6M params

Tensor type

F32