KITScenes LongTail Challenge — Solution

Challenge: KIT-MRT/KITScenes-LongTail-Challenge
Dataset: KIT-MRT/KITScenes-LongTail
Paper: arXiv:2603.23607

Approach: Few-Shot CoT Kinematic Bicycle Model

Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:

VLM Reasoning — analyzes driving scenarios using past trajectory + instruction
Kinematic Bicycle Model (Kong et al. 2015) — converts action commands → 25 precise waypoints
Few-Shot Chain-of-Thought — 3 training examples guide the VLM's structured output

Supported VLM Backends

Backend	Command	Notes
Ollama (local)	`--backend ollama --ollama-model qwen3.5-vl:9b`	No rate limits, free, best for batch
HF Inference	`--backend hf`	Llama-4-Scout-17B via novita/groq
Gemini	`--backend gemini`	Needs `GOOGLE_API_KEY`, free tier rate-limited
Fallback	`--no-vlm`	Instruction-keyword heuristic, no API needed

Quick Start (Local with Ollama)

# 1. Clone
git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
cd kitscenes-longtail-solution

# 2. Install deps
pip install numpy huggingface_hub pyarrow

# 3. Start Ollama model (in another terminal)
ollama pull qwen3.5-vl:9b   # or gemma4:e4b
ollama serve

# 4. Run generation
python scripts/generate_production.py \
  --backend ollama \
  --ollama-model qwen3.5-vl:9b \
  --metadata data/test_metadata.json \
  --output submissions/submission_ollama.jsonl \
  --upload-repo ValeraZSD/kitscenes-submissions \
  --upload-filename submission_ollama_v1.jsonl

# 5. Validate
python scripts/validate.py submissions/submission_ollama.jsonl

Pipeline Architecture

Input: past_trajectory (21 pts @ 5Hz) + driving_instruction
         ↓
Few-Shot CoT Prompt (3 training examples + query)
  → VLM (Ollama / HF Inference / Gemini)
  → XML output with 9 structured fields
         ↓
Parse XML → Normalize to 5 accel × 5 steer commands
         ↓
Kinematic Bicycle Model (Kong et al. 2015)
  Phase 1: 0–3s (15 steps)   Phase 2: 3–5s (10 steps)
  → 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)
         ↓
Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)

Steering Angle Calibration

Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:

Command	Paper ≤60km/h	Calibrated	>60km/h
turning left	30°	6°	0.3°
turning slightly left	10°	1°	0.1°
steering straight	0°	0°	0°
turning slightly right	-10°	-1°	-0.1°
turning right	-30°	-6°	-0.3°

Files

File	Description
`scripts/generate_production.py`	Main pipeline — VLM + bicycle model + validation + upload
`scripts/validate.py`	Standalone submission validator (mirrors challenge code)
`data/test_metadata.json`	Pre-extracted test metadata (400 scenarios, 190KB, no images)
`configs/action_vocabulary.json`	Action → parameter mapping with calibrated values
`submissions/`	Generated submission JSONL files
`requirements.txt`	Python dependencies

Test Data Distribution

Instruction	Count	Scenario Type	Count
drive straight on	196	intersection	125
turn right	43	overtake/lane change	102
use left lane	34	specifically selected	68
use right lane	30	construction zone	36
overtake truck	25	heavy rain	27
turn left	20	snow & wintry mix	23
u-turn	8	nighttime	19

Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ValeraZSD/kitscenes-longtail-solution

LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

Paper • 2603.23607 • Published Mar 24 • 19