babylm-base7f5m-gpt2

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 185
training_steps: 18500
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Accuracy
5.3654	0.0952	200	4.7347	0.3484
4.5626	0.1905	400	4.3043	0.3641
4.3217	0.2857	600	4.1250	0.3711
4.1567	0.3810	800	4.0333	0.3738
4.1352	0.4762	1000	3.9608	0.3791
4.0176	0.5714	1200	3.9202	0.3804
3.9957	0.6667	1400	3.8617	0.3837
3.9308	0.7619	1600	3.8101	0.3895
3.886	0.8571	1800	3.7603	0.3946
3.7964	0.9524	2000	3.7185	0.3992
3.3605	1.9048	4000	3.4152	0.4256
3.0557	2.8571	6000	3.2015	0.4532
2.8541	3.8095	8000	3.0833	0.4663
2.7548	4.7619	10000	3.0094	0.4744
2.6384	5.7143	12000	2.9641	0.4786
2.6129	6.6667	14000	2.9363	0.4819
2.5034	7.6190	16000	2.9195	0.4830
2.485	8.5714	18000	2.9107	0.4841

Safetensors

Model size

98.4M params

Tensor type

F32