LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
Paper • 2603.23607 • Published • 19
Challenge: KIT-MRT/KITScenes-LongTail-Challenge
Dataset: KIT-MRT/KITScenes-LongTail
Paper: arXiv:2603.23607
Based on the best open-source method from the paper (MMS 4.24 with Gemma 3 12B), this solution uses:
| Backend | Command | Notes |
|---|---|---|
| Ollama (local) | --backend ollama --ollama-model qwen3.5-vl:9b |
No rate limits, free, best for batch |
| HF Inference | --backend hf |
Llama-4-Scout-17B via novita/groq |
| Gemini | --backend gemini |
Needs GOOGLE_API_KEY, free tier rate-limited |
| Fallback | --no-vlm |
Instruction-keyword heuristic, no API needed |
# 1. Clone
git clone https://huggingface.co/ValeraZSD/kitscenes-longtail-solution
cd kitscenes-longtail-solution
# 2. Install deps
pip install numpy huggingface_hub pyarrow
# 3. Start Ollama model (in another terminal)
ollama pull qwen3.5-vl:9b # or gemma4:e4b
ollama serve
# 4. Run generation
python scripts/generate_production.py \
--backend ollama \
--ollama-model qwen3.5-vl:9b \
--metadata data/test_metadata.json \
--output submissions/submission_ollama.jsonl \
--upload-repo ValeraZSD/kitscenes-submissions \
--upload-filename submission_ollama_v1.jsonl
# 5. Validate
python scripts/validate.py submissions/submission_ollama.jsonl
Input: past_trajectory (21 pts @ 5Hz) + driving_instruction
↓
Few-Shot CoT Prompt (3 training examples + query)
→ VLM (Ollama / HF Inference / Gemini)
→ XML output with 9 structured fields
↓
Parse XML → Normalize to 5 accel × 5 steer commands
↓
Kinematic Bicycle Model (Kong et al. 2015)
Phase 1: 0–3s (15 steps) Phase 2: 3–5s (10 steps)
→ 25 waypoints in ego-centric coordinates (+x=fwd, +y=left)
↓
Output: scenario_id + future_trajectory (25×2) + reasoning (english, 9 fields)
Paper Table 6 values (±30° at low speed) are for instantaneous inputs. For sustained 3s phases in the bicycle model, we calibrated against expert trajectories:
| Command | Paper ≤60km/h | Calibrated | >60km/h |
|---|---|---|---|
| turning left | 30° | 6° | 0.3° |
| turning slightly left | 10° | 1° | 0.1° |
| steering straight | 0° | 0° | 0° |
| turning slightly right | -10° | -1° | -0.1° |
| turning right | -30° | -6° | -0.3° |
| File | Description |
|---|---|
scripts/generate_production.py |
Main pipeline — VLM + bicycle model + validation + upload |
scripts/validate.py |
Standalone submission validator (mirrors challenge code) |
data/test_metadata.json |
Pre-extracted test metadata (400 scenarios, 190KB, no images) |
configs/action_vocabulary.json |
Action → parameter mapping with calibrated values |
submissions/ |
Generated submission JSONL files |
requirements.txt |
Python dependencies |
| Instruction | Count | Scenario Type | Count |
|---|---|---|---|
| drive straight on | 196 | intersection | 125 |
| turn right | 43 | overtake/lane change | 102 |
| use left lane | 34 | specifically selected | 68 |
| use right lane | 30 | construction zone | 36 |
| overtake truck | 25 | heavy rain | 27 |
| turn left | 20 | snow & wintry mix | 23 |
| u-turn | 8 | nighttime | 19 |
Speed: min=0, mean=52, max=130 km/h — 65% urban (≤60), 35% highway (>60)