Built with Axolotl

See axolotl config

axolotl version: 0.12.2

base_model: google/gemma-3-270m-it

# Automatically upload checkpoint and final model to HF
hub_model_id: abdullahmeda/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

load_in_8bit: false
load_in_4bit: false
strict: false

# gemma3 doesn't seem to play nice with ddp
ddp_find_unused_parameters: true

chat_template: gemma3
eot_tokens:
  - <end_of_turn>

datasets:
  - path: kaggle-map/pointwise-reranker
    type: chat_template
    split: train

test_datasets:
  - path: kaggle-map/pointwise-reranker
    type: chat_template
    split: val

dataset_processes: 32
dataset_prepared_path: last_run_prepared
output_dir: ./outputs/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

sequence_len: 1024
sample_packing: true
eval_sample_packing: false

deepspeed: deepspeed_configs/zero1.json

wandb_project: map-math-misconceptions
wandb_entity:
wandb_watch:
wandb_name: pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 16
num_epochs: 5
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 5e-6

bf16: true
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
resume_from_checkpoint:
logging_steps: 10
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 3
saves_per_epoch: 3
weight_decay: 0.01

save_first_step: true  # uncomment this to validate checkpoint saving works with your config

pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

This model is a fine-tuned version of google/gemma-3-270m-it on the kaggle-map/pointwise-reranker dataset. It achieves the following results on the evaluation set:

  • Loss: 3.9502
  • Memory/max Mem Active(gib): 55.94
  • Memory/max Mem Allocated(gib): 55.94
  • Memory/device Mem Reserved(gib): 70.82

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 512
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 92
  • training_steps: 920

Training results

Training Loss Epoch Step Validation Loss Mem Active(gib) Mem Allocated(gib) Mem Reserved(gib)
No log 0 0 7.4394 43.55 43.55 43.79
0.1851 0.3358 62 4.8209 55.94 55.94 70.82
0.1584 0.6716 124 4.7051 55.94 55.94 70.82
0.1391 1.0054 186 4.4124 55.94 55.94 70.82
0.1123 1.3412 248 4.4912 55.94 55.94 70.82
0.1013 1.6770 310 4.2072 55.94 55.94 70.82
0.1002 2.0108 372 4.1081 55.94 55.94 70.82
0.0897 2.3466 434 3.9883 55.94 55.94 70.82
0.0829 2.6825 496 3.9694 55.94 55.94 70.82
0.0819 3.0162 558 3.9431 55.94 55.94 70.82
0.0713 3.3521 620 3.9944 55.94 55.94 70.82
0.0696 3.6879 682 3.9014 55.94 55.94 70.82
0.0713 4.0217 744 3.9357 55.94 55.94 70.82
0.0643 4.3575 806 3.9005 55.94 55.94 70.82
0.0652 4.6933 868 3.9502 55.94 55.94 70.82

Framework versions

  • Transformers 4.55.2
  • Pytorch 2.6.0+cu124
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
4
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for abdullahmeda/pointwise-rerank-gemma-3-270m-it-ds0-oen48ghs

Finetuned
(840)
this model