See axolotl config

axolotl version: 0.12.2

base_model: google/gemma-3-270m-it

# Automatically upload checkpoint and final model to HF
hub_model_id: abdullahmeda/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

load_in_8bit: false
load_in_4bit: false
strict: false

# gemma3 doesn't seem to play nice with ddp
ddp_find_unused_parameters: true

chat_template: gemma3
eot_tokens:
  - <end_of_turn>

datasets:
  - path: kaggle-map/pointwise-reranker
    type: chat_template
    split: train

test_datasets:
  - path: kaggle-map/pointwise-reranker
    type: chat_template
    split: val

dataset_processes: 32
dataset_prepared_path: last_run_prepared
output_dir: ./outputs/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

sequence_len: 1024
sample_packing: true
eval_sample_packing: false

deepspeed: deepspeed_configs/zero1.json

wandb_project: map-math-misconceptions
wandb_entity:
wandb_watch:
wandb_name: pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 16
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 5e-6

bf16: true
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 10
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 3
saves_per_epoch: 3
weight_decay: 0.01

save_first_step: true  # uncomment this to validate checkpoint saving works with your config

pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

This model is a fine-tuned version of google/gemma-3-270m-it on the kaggle-map/pointwise-reranker dataset. It achieves the following results on the evaluation set:

Loss: 3.9502
Memory/max Mem Active(gib): 55.94
Memory/max Mem Allocated(gib): 55.94
Memory/device Mem Reserved(gib): 70.82

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 16
total_train_batch_size: 512
total_eval_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 92
training_steps: 920

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	7.4394	43.55	43.55	43.79
0.1851	0.3358	62	4.8209	55.94	55.94	70.82
0.1584	0.6716	124	4.7051	55.94	55.94	70.82
0.1391	1.0054	186	4.4124	55.94	55.94	70.82
0.1123	1.3412	248	4.4912	55.94	55.94	70.82
0.1013	1.6770	310	4.2072	55.94	55.94	70.82
0.1002	2.0108	372	4.1081	55.94	55.94	70.82
0.0897	2.3466	434	3.9883	55.94	55.94	70.82
0.0829	2.6825	496	3.9694	55.94	55.94	70.82
0.0819	3.0162	558	3.9431	55.94	55.94	70.82
0.0713	3.3521	620	3.9944	55.94	55.94	70.82
0.0696	3.6879	682	3.9014	55.94	55.94	70.82
0.0713	4.0217	744	3.9357	55.94	55.94	70.82
0.0643	4.3575	806	3.9005	55.94	55.94	70.82
0.0652	4.6933	868	3.9502	55.94	55.94	70.82

Framework versions

Transformers 4.55.2
Pytorch 2.6.0+cu124
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 4

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for abdullahmeda/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

Base model

google/gemma-3-270m

Finetuned

google/gemma-3-270m-it

Finetuned

(840)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard