Kevin Xie
Upload main processing scripts for this repo
577fb61

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

πŸš€ Leaderboard Generation Scripts

This directory contains scripts to automatically generate leaderboard JSON files from Excel data.

⚑ Quick Start

1. Install Dependencies

pip install -r scripts/requirements.txt

Required packages:

  • pandas>=2.0.0 - For reading Excel files and data manipulation
  • openpyxl>=3.1.0 - For reading .xlsx Excel files

2. Configure Excel Path

Open scripts/config.py and update the EXCEL_PATH variable:

EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx"  # Place in project root
# OR
EXCEL_PATH = Path("/Users/yourname/Desktop/benchmark.xlsx")  # Use absolute path

3. Run the Script

From the project root directory:

python scripts/main.py

Or with Python 3 explicitly:

python3 scripts/main.py

That's it! The script will automatically:

  • βœ… Read the Excel file
  • βœ… Generate all three leaderboards (Zero-Shot, Few-Shot, CoT)
  • βœ… Update task information
  • βœ… Calculate and update rankings
  • βœ… Save everything to the leaderboards/ directory

βš™οΈ Configuration

All settings are in scripts/config.py:

Required Configuration

EXCEL_PATH - Path to your Excel file containing model and task data

# In project root (recommended)
EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx"

# Absolute path
EXCEL_PATH = Path("/Users/yourname/Desktop/Clinical Benchmark and LLM.xlsx")

# In a subdirectory
EXCEL_PATH = BASE_DIR / "data" / "benchmark.xlsx"

Optional Configuration

INVALID_MODELS - Models to exclude from leaderboards

INVALID_MODELS = [
    "gemma-3-27b-pt",
    "Hulu-Med-7B",
    # Add model names that should not appear
]

Output paths (usually don't need to change):

  • ZERO_SHOT_OUTPUT - Zero-Shot leaderboard path
  • FEW_SHOT_OUTPUT - Few-Shot leaderboard path
  • COT_OUTPUT - Chain-of-Thought leaderboard path

πŸ“‹ Complete Update Workflow

  1. Get your Excel file

    • Download/obtain the latest "Clinical Benchmark and LLM.xlsx"
    • Place it in the project root or note its location
  2. Update configuration

    # Edit scripts/config.py
    # Set EXCEL_PATH to your file location
    # Add any models to INVALID_MODELS if needed
    
  3. Run the generation script

    python scripts/main.py
    
  4. Verify the output

    • Check leaderboards/Zero-Shot_leaderboard.json
    • Check leaderboards/Few-Shot_leaderboard.json
    • Check leaderboards/CoT_leaderboard.json
    • Check task_information.json
  5. Test locally

    python app.py
    # Open browser to test the leaderboard interface
    
  6. Deploy

    • Commit and push to GitHub
    • Deploy to Hugging Face Spaces

πŸ“ Files Overview

  • config.py - Central configuration file ⚠️ EDIT THIS FILE
  • main.py - Main script that orchestrates leaderboard generation
  • requirements.txt - Python dependencies for the scripts
  • README.md - This file
  • helpers/ - Helper modules for processing Excel data
    • excel_processor.py - Processes Excel files and creates leaderboards
    • reorganize_indices.py - Reorganizes model indices by size
    • CONSTANTS.py - Constants for data mapping (task names, domain mappings, etc.)
    • leaderboards.py - Placeholder for future leaderboard operations
    • __init__.py - Makes helpers a Python package

🀝 Sharing This Code

When sharing this code with others:

  1. They only need to update scripts/config.py with their Excel file path
  2. All other files will automatically use the configured paths
  3. No need to search through multiple files to update paths
  4. The script validates the Excel file exists before running

πŸ› Troubleshooting

"Excel file not found" error

❌ ERROR: Excel file not found!

Solution:

  • Check that EXCEL_PATH in scripts/config.py points to a valid file
  • Verify the file exists at that location
  • Use absolute paths if relative paths don't work

"Missing models" or unexpected output

Solution:

  • Verify that model names in INVALID_MODELS match exactly (case-sensitive)
  • Check that the Excel file has the required sheets:
    • "Models (Simplified)" - contains model information
    • "B-CLF", "B-EXT", "B-GEN" - for Zero-Shot
    • "B-CLF-5shot", "B-EXT-5shot", "B-GEN-5shot" - for Few-Shot
    • "B-CLF-CoT", "B-EXT-CoT", "B-GEN-CoT" - for CoT
    • "Task-all" - for task information

Import errors

ModuleNotFoundError: No module named 'pandas'

Solution:

  • Install the required packages: pip install -r scripts/requirements.txt
  • Make sure you're using the correct Python environment

Running from wrong directory

ModuleNotFoundError: No module named 'helpers'

Solution:

  • Always run from the project root: python scripts/main.py
  • Not from inside the scripts directory: ❌ cd scripts && python main.py

πŸ’‘ Excel File Requirements

Your Excel file must contain:

Required Sheets:

  1. Models (Simplified) - Model metadata

    • Columns: Name, Domain, License, Size (B)
  2. Task Sheets (for each leaderboard type):

    • Zero-Shot: B-CLF, B-EXT, B-GEN
    • Few-Shot: B-CLF-5shot, B-EXT-5shot, B-GEN-5shot
    • CoT: B-CLF-CoT, B-EXT-CoT, B-GEN-CoT
  3. Task-all - Task metadata

    • Columns: Task name, Language, Task Type, Clinical context, Data Access, etc.

Model Name Handling:

The script automatically handles some model name variations:

  • gpt-35-turbo-0125 β†’ gpt-35-turbo
  • gpt-4o-0806 β†’ gpt-4o
  • gemini-2.0-flash-001 β†’ gemini-2.0-flash
  • And more (see excel_processor.py for full list)

🎯 What the Script Does

  1. Validates the Excel file exists
  2. Loads model information from "Models (Simplified)" sheet
  3. Processes each leaderboard type (Zero-Shot, Few-Shot, CoT):
    • Extracts performance data from task sheets
    • Calculates average performance
    • Generates JSON with model info and scores
  4. Reorganizes model indices by size (smallest to largest)
  5. Updates rankings based on average performance
  6. Creates task_information.json with metadata
  7. Saves all output files to the leaderboards/ directory

πŸ“ Notes

  • The script preserves model order by size within each leaderboard
  • Rankings (T column) are updated based on average performance
  • Invalid models are excluded before processing
  • All JSON files are formatted with 4-space indentation
  • The script uses UTF-8 encoding to support non-ASCII characters