KITScenes LongTail Challenge — Solution

Challenge: KIT-MRT/KITScenes-LongTail-Challenge
Dataset: KIT-MRT/KITScenes-LongTail
Paper: arXiv:2603.23607

Approach: Few-Shot CoT Kinematic Bicycle Model

Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:

  1. VLM Reasoning — analyzes driving scenarios using past trajectory + instruction
  2. Kinematic Bicycle Model (Kong et al. 2015) — converts action commands → 25 precise waypoints
  3. Few-Shot Chain-of-Thought — 3 training examples guide the VLM's structured output

Supported VLM Backends

Backend Command Notes
Ollama (local) --backend ollama --ollama-model qwen3.5-vl:9b No rate limits, free, best for batch
HF Inference --backend hf Llama-4-Scout-17B via novita/groq
Gemini --backend gemini Needs GOOGLE_API_KEY, free tier rate-limited
Fallback --no-vlm Instruction-keyword heuristic, no API needed

Quick Start (Local with Ollama)

# 1. Clone
git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
cd kitscenes-longtail-solution

# 2. Install deps
pip install numpy huggingface_hub pyarrow

# 3. Start Ollama model (in another terminal)
ollama pull qwen3.5-vl:9b   # or gemma4:e4b
ollama serve

# 4. Run generation
python scripts/generate_production.py \
  --backend ollama \
  --ollama-model qwen3.5-vl:9b \
  --metadata data/test_metadata.json \
  --output submissions/submission_ollama.jsonl \
  --upload-repo ValeraZSD/kitscenes-submissions \
  --upload-filename submission_ollama_v1.jsonl

# 5. Validate
python scripts/validate.py submissions/submission_ollama.jsonl

Pipeline Architecture

Input: past_trajectory (21 pts @ 5Hz) + driving_instruction
         ↓
Few-Shot CoT Prompt (3 training examples + query)
  → VLM (Ollama / HF Inference / Gemini)
  → XML output with 9 structured fields
         ↓
Parse XML → Normalize to 5 accel × 5 steer commands
         ↓
Kinematic Bicycle Model (Kong et al. 2015)
  Phase 1: 0–3s (15 steps)   Phase 2: 3–5s (10 steps)
  → 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)
         ↓
Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)

Steering Angle Calibration

Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:

Command Paper ≤60km/h Calibrated >60km/h
turning left 30° 6° 0.3°
turning slightly left 10° 1° 0.1°
steering straight 0° 0° 0°
turning slightly right -10° -1° -0.1°
turning right -30° -6° -0.3°

Files

File Description
scripts/generate_production.py Main pipeline — VLM + bicycle model + validation + upload
scripts/validate.py Standalone submission validator (mirrors challenge code)
data/test_metadata.json Pre-extracted test metadata (400 scenarios, 190KB, no images)
configs/action_vocabulary.json Action → parameter mapping with calibrated values
submissions/ Generated submission JSONL files
requirements.txt Python dependencies

Test Data Distribution

Instruction Count Scenario Type Count
drive straight on 196 intersection 125
turn right 43 overtake/lane change 102
use left lane 34 specifically selected 68
use right lane 30 construction zone 36
overtake truck 25 heavy rain 27
turn left 20 snow & wintry mix 23
u-turn 8 nighttime 19

Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ValeraZSD/kitscenes-longtail-solution