babylm-base1m-roberta

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 198
training_steps: 19800

Training Loss	Epoch	Step	Validation Loss	Accuracy
7.0294	0.1066	200	6.5917	0.0954
6.1531	0.2132	400	6.2447	0.1254
5.9932	0.3198	600	6.1193	0.1324
5.8652	0.4264	800	6.0640	0.1358
5.7992	0.5330	1000	6.0454	0.1388
5.7234	0.6397	1200	5.9918	0.1406
5.6744	0.7463	1400	5.9741	0.1415
5.6754	0.8529	1600	5.9548	0.1433
5.648	0.9595	1800	5.9283	0.1452
5.5538	1.0661	2000	5.9202	0.1413
5.3874	2.1322	4000	5.8212	0.1512
5.2445	3.1983	6000	5.7604	0.1514
5.0922	4.2644	8000	5.6927	0.1559
4.9931	5.3305	10000	5.6602	0.1555
4.9755	6.3966	12000	5.6355	0.1567
4.9476	7.4627	14000	5.6257	0.1565
4.8731	8.5288	16000	5.6165	0.1563
4.8685	9.5949	18000	5.6023	0.1574

Safetensors

Model size

98.6M params

Tensor type

F32