BRIDGE-OPEN-Leaderboard

Sleeping

File size: 6,534 Bytes

577fb61

# 🚀 Leaderboard Generation Scripts

This directory contains scripts to automatically generate leaderboard JSON files from Excel data.

## ⚡ Quick Start

### 1. Install Dependencies

```bash
pip install -r scripts/requirements.txt
```

Required packages:
- `pandas>=2.0.0` - For reading Excel files and data manipulation
- `openpyxl>=3.1.0` - For reading .xlsx Excel files

### 2. Configure Excel Path

Open `scripts/config.py` and update the `EXCEL_PATH` variable:

```python
EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx"  # Place in project root
# OR
EXCEL_PATH = Path("/Users/yourname/Desktop/benchmark.xlsx")  # Use absolute path
```

### 3. Run the Script

From the **project root** directory:

```bash
python scripts/main.py
```

Or with Python 3 explicitly:

```bash
python3 scripts/main.py
```

That's it! The script will automatically:
- ✅ Read the Excel file
- ✅ Generate all three leaderboards (Zero-Shot, Few-Shot, CoT)
- ✅ Update task information
- ✅ Calculate and update rankings
- ✅ Save everything to the `leaderboards/` directory

## ⚙️ Configuration

All settings are in `scripts/config.py`:

### Required Configuration

**`EXCEL_PATH`** - Path to your Excel file containing model and task data
```python
# In project root (recommended)
EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx"

# Absolute path
EXCEL_PATH = Path("/Users/yourname/Desktop/Clinical Benchmark and LLM.xlsx")

# In a subdirectory
EXCEL_PATH = BASE_DIR / "data" / "benchmark.xlsx"
```

### Optional Configuration

**`INVALID_MODELS`** - Models to exclude from leaderboards
```python
INVALID_MODELS = [
    "gemma-3-27b-pt",
    "Hulu-Med-7B",
    # Add model names that should not appear
]
```

**Output paths** (usually don't need to change):
- `ZERO_SHOT_OUTPUT` - Zero-Shot leaderboard path
- `FEW_SHOT_OUTPUT` - Few-Shot leaderboard path  
- `COT_OUTPUT` - Chain-of-Thought leaderboard path

## 📋 Complete Update Workflow

1. **Get your Excel file**
   - Download/obtain the latest "Clinical Benchmark and LLM.xlsx"
   - Place it in the project root or note its location

2. **Update configuration**
   ```bash
   # Edit scripts/config.py
   # Set EXCEL_PATH to your file location
   # Add any models to INVALID_MODELS if needed
   ```

3. **Run the generation script**
   ```bash
   python scripts/main.py
   ```

4. **Verify the output**
   - Check `leaderboards/Zero-Shot_leaderboard.json`
   - Check `leaderboards/Few-Shot_leaderboard.json`
   - Check `leaderboards/CoT_leaderboard.json`
   - Check `task_information.json`

5. **Test locally**
   ```bash
   python app.py
   # Open browser to test the leaderboard interface
   ```

6. **Deploy**
   - Commit and push to GitHub
   - Deploy to Hugging Face Spaces

## 📁 Files Overview

- **`config.py`** - Central configuration file ⚠️ **EDIT THIS FILE**
- **`main.py`** - Main script that orchestrates leaderboard generation
- **`requirements.txt`** - Python dependencies for the scripts
- **`README.md`** - This file
- **`helpers/`** - Helper modules for processing Excel data
  - `excel_processor.py` - Processes Excel files and creates leaderboards
  - `reorganize_indices.py` - Reorganizes model indices by size
  - `CONSTANTS.py` - Constants for data mapping (task names, domain mappings, etc.)
  - `leaderboards.py` - Placeholder for future leaderboard operations
  - `__init__.py` - Makes helpers a Python package

## 🤝 Sharing This Code

When sharing this code with others:
1. They only need to update `scripts/config.py` with their Excel file path
2. All other files will automatically use the configured paths
3. No need to search through multiple files to update paths
4. The script validates the Excel file exists before running

## 🐛 Troubleshooting

### "Excel file not found" error
```
❌ ERROR: Excel file not found!
```
**Solution**: 
- Check that `EXCEL_PATH` in `scripts/config.py` points to a valid file
- Verify the file exists at that location
- Use absolute paths if relative paths don't work

### "Missing models" or unexpected output
**Solution**:
- Verify that model names in `INVALID_MODELS` match exactly (case-sensitive)
- Check that the Excel file has the required sheets:
  - "Models (Simplified)" - contains model information
  - "B-CLF", "B-EXT", "B-GEN" - for Zero-Shot
  - "B-CLF-5shot", "B-EXT-5shot", "B-GEN-5shot" - for Few-Shot
  - "B-CLF-CoT", "B-EXT-CoT", "B-GEN-CoT" - for CoT
  - "Task-all" - for task information

### Import errors
```
ModuleNotFoundError: No module named 'pandas'
```
**Solution**: 
- Install the required packages: `pip install -r scripts/requirements.txt`
- Make sure you're using the correct Python environment

### Running from wrong directory
```
ModuleNotFoundError: No module named 'helpers'
```
**Solution**: 
- Always run from the **project root**: `python scripts/main.py`
- Not from inside the scripts directory: ❌ `cd scripts && python main.py`

## 💡 Excel File Requirements

Your Excel file must contain:

### Required Sheets:
1. **Models (Simplified)** - Model metadata
   - Columns: Name, Domain, License, Size (B)
   
2. **Task Sheets** (for each leaderboard type):
   - Zero-Shot: B-CLF, B-EXT, B-GEN
   - Few-Shot: B-CLF-5shot, B-EXT-5shot, B-GEN-5shot
   - CoT: B-CLF-CoT, B-EXT-CoT, B-GEN-CoT
   
3. **Task-all** - Task metadata
   - Columns: Task name, Language, Task Type, Clinical context, Data Access, etc.

### Model Name Handling:
The script automatically handles some model name variations:
- `gpt-35-turbo-0125` → `gpt-35-turbo`
- `gpt-4o-0806` → `gpt-4o`
- `gemini-2.0-flash-001` → `gemini-2.0-flash`
- And more (see `excel_processor.py` for full list)

## 🎯 What the Script Does

1. **Validates** the Excel file exists
2. **Loads** model information from "Models (Simplified)" sheet
3. **Processes** each leaderboard type (Zero-Shot, Few-Shot, CoT):
   - Extracts performance data from task sheets
   - Calculates average performance
   - Generates JSON with model info and scores
4. **Reorganizes** model indices by size (smallest to largest)
5. **Updates** rankings based on average performance
6. **Creates** task_information.json with metadata
7. **Saves** all output files to the `leaderboards/` directory

## 📝 Notes

- The script preserves model order by size within each leaderboard
- Rankings (T column) are updated based on average performance
- Invalid models are excluded before processing
- All JSON files are formatted with 4-space indentation
- The script uses UTF-8 encoding to support non-ASCII characters