YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

CSIRO Image2Biomass Prediction 🌿

A competitive solution for the CSIRO Image2Biomass Prediction Kaggle competition.

πŸ† Competition Overview

Predict pasture biomass from images to help farmers make smarter grazing decisions.

Targets (5 regression outputs):

Target Weight Description
Dry_Total_g 0.50 Total dry biomass (grams) β€” most important
GDM_g 0.20 Green dry matter (grams)
Dry_Green_g 0.10 Dry green biomass
Dry_Dead_g 0.10 Dry dead biomass
Dry_Clover_g 0.10 Dry clover biomass

Metric: Globally weighted RΒ² across all (image, target) pairs.

πŸš€ Solution Architecture

Key Design Decisions

  1. DINOv2 backbone β€” Self-supervised ViT pretrained on 142M images provides superior feature generalization for out-of-distribution agricultural imagery
  2. Log-transform targets β€” Biomass values are right-skewed; log1p(y) β†’ train β†’ expm1(pred) normalizes the distribution
  3. Weighted SmoothL1 loss β€” Robust to field measurement noise (recommended by DINOvTree paper), weighted by competition target importance
  4. Consistency regularization β€” Enforces Total β‰ˆ Green + Dead + Clover structural constraint
  5. Label Distribution Smoothing (LDS) β€” Addresses imbalanced target distributions (from "Delving into Deep Imbalanced Regression", ICML 2021)
  6. D4 augmentations β€” Full dihedral group (flips + rotations) for top-down pasture images
  7. Multi-backbone ensemble β€” DINOv2 + ConvNeXt for diverse predictions
  8. Test-Time Augmentation (TTA) β€” 4Γ— TTA with geometric transforms

Model Architecture

Input Image (224Γ—224) 
    β†’ DINOv2-Base backbone (768-dim features)
    β†’ LayerNorm β†’ Dropout(0.3) 
    β†’ Linear(768, 512) β†’ GELU β†’ Dropout(0.15)
    β†’ Linear(512, 256) β†’ GELU β†’ Dropout(0.09)
    β†’ Linear(256, 5) β†’ predictions

Backbones (ranked by expected performance)

Backbone Params Feature Dim Input Size Notes
vit_base_patch14_dinov2.lvd142m 86M 768 224Γ—224 Best generalization
vit_large_patch14_dinov2.lvd142m 304M 1024 224Γ—224 Higher quality, needs more VRAM
convnext_large.fb_in22k_ft_in1k 198M 1536 224Γ—224 Strong CNN baseline
efficientnet_b4.ra2_in1k 19M 1792 320Γ—320 Lightweight, fast
swin_large_patch4_window7_224 197M 1536 224Γ—224 Hierarchical ViT

πŸ“ Project Structure

β”œβ”€β”€ train.py                     # Full training pipeline with CLI
β”œβ”€β”€ inference.py                 # Inference with ensemble + TTA
β”œβ”€β”€ train_ensemble.py            # Multi-backbone ensemble training
β”œβ”€β”€ kaggle_train_notebook.py     # Self-contained Kaggle training notebook
β”œβ”€β”€ kaggle_inference_notebook.py # Self-contained Kaggle inference notebook
└── README.md                    # This file

πŸ› οΈ Setup

pip install torch torchvision timm albumentations pandas numpy scikit-learn scipy pillow

πŸ“‹ Quick Start

1. Single Backbone Training

python train.py \
    --data_dir /path/to/competition/data \
    --output_dir ./output \
    --backbone dinov2_base \
    --epochs 30 \
    --batch_size 32 \
    --backbone_lr 3e-5 \
    --head_lr 1e-3 \
    --n_folds 5 \
    --aug_strength medium \
    --use_lds \
    --grad_checkpointing

2. Multi-Backbone Ensemble

python train_ensemble.py \
    --data_dir /path/to/competition/data \
    --output_dir ./ensemble_output \
    --backbones dinov2_base convnext_large \
    --epochs 30 \
    --n_folds 5

3. Inference

python inference.py \
    --data_dir /path/to/competition/data \
    --model_dir ./output \
    --output submission.csv \
    --n_tta 4

4. Kaggle Submission

  1. Train: Run kaggle_train_notebook.py as a Kaggle GPU notebook
  2. Save models: Download output and upload as a Kaggle dataset
  3. Submit: Run kaggle_inference_notebook.py with models as input dataset

βš™οΈ Training Configuration

Recommended Settings

Setting Value Rationale
Backbone LR 3e-5 Differential LR (0.5Γ— of head)
Head LR 1e-3 Fast head convergence
Weight Decay 1e-2 Standard for AdamW
Warmup Ratio 0.05 5% of training for LR warmup
Scheduler Cosine With warm restart
Batch Size 32 Effective 64 with grad_accum=2
Augmentations Medium D4 + color jitter + CoarseDropout
Log Transform Yes Normalizes skewed targets
LDS Yes Handles imbalanced distributions
Consistency Weight 0.1 Total β‰ˆ Green + Dead + Clover
Early Stopping 8 epochs Based on validation RΒ²

Hyperparameter Sweep

Key hyperparameters to tune:

  • backbone_lr: [1e-5, 3e-5, 5e-5]
  • head_lr: [5e-4, 1e-3, 2e-3]
  • dropout: [0.2, 0.3, 0.4]
  • hidden_dim: [256, 512, 1024]
  • consistency_weight: [0.0, 0.05, 0.1, 0.2]
  • aug_strength: [light, medium, heavy]
  • img_size: [224, 384, 448]

πŸ“š References

πŸ… Expected Performance

Based on literature and OOF validation:

Configuration Expected CV RΒ²
DINOv2-Base (single) 0.55–0.70
ConvNeXt-Large (single) 0.50–0.65
DINOv2-Base + ConvNeXt-Large ensemble 0.60–0.75
DINOv2-Large + TTA 0.60–0.75
Full ensemble (3 backbones + TTA + LDS) 0.65–0.80

Note: Actual scores depend on data quality, image resolution, and distribution shift between train/test.

πŸ’‘ Tips for Improvement

  1. Higher resolution β€” Try 384Γ—384 or 448Γ—448 (more detail for biomass estimation)
  2. Deeper heads β€” Try separate heads per target for specialization
  3. NDVI features β€” If NDVI data is available, concatenate with image features
  4. Pseudo-labeling β€” Train on test set pseudo-labels for domain adaptation
  5. Multi-scale features β€” Use timm feature extraction at multiple scales
  6. Stacking β€” Train a second-level model on OOF predictions from diverse backbones
  7. Target engineering β€” Predict ratios (Green/Total, Dead/Total) as auxiliary targets
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for notRaphael/csiro-image2biomass