BRIDGE-OPEN-Leaderboard

Sleeping

App Files Files Community

BRIDGE-OPEN-Leaderboard / scripts /README.md

Kevin Xie

Upload main processing scripts for this repo

577fb61 about 2 months ago

preview code

raw

history blame contribute delete

6.53 kB

	# 🚀 Leaderboard Generation Scripts

	This directory contains scripts to automatically generate leaderboard JSON files from Excel data.

	## ⚡ Quick Start

	### 1. Install Dependencies

	```bash
	pip install -r scripts/requirements.txt
	```

	Required packages:
	- `pandas>=2.0.0` - For reading Excel files and data manipulation
	- `openpyxl>=3.1.0` - For reading .xlsx Excel files

	### 2. Configure Excel Path

	Open `scripts/config.py` and update the `EXCEL_PATH` variable:

	```python
	EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx" # Place in project root
	# OR
	EXCEL_PATH = Path("/Users/yourname/Desktop/benchmark.xlsx") # Use absolute path
	```

	### 3. Run the Script

	From the project root directory:

	```bash
	python scripts/main.py
	```

	Or with Python 3 explicitly:

	```bash
	python3 scripts/main.py
	```

	That's it! The script will automatically:
	- ✅ Read the Excel file
	- ✅ Generate all three leaderboards (Zero-Shot, Few-Shot, CoT)
	- ✅ Update task information
	- ✅ Calculate and update rankings
	- ✅ Save everything to the `leaderboards/` directory

	## ⚙️ Configuration

	All settings are in `scripts/config.py`:

	### Required Configuration

	`EXCEL_PATH` - Path to your Excel file containing model and task data
	```python
	# In project root (recommended)
	EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx"

	# Absolute path
	EXCEL_PATH = Path("/Users/yourname/Desktop/Clinical Benchmark and LLM.xlsx")

	# In a subdirectory
	EXCEL_PATH = BASE_DIR / "data" / "benchmark.xlsx"
	```

	### Optional Configuration

	`INVALID_MODELS` - Models to exclude from leaderboards
	```python
	INVALID_MODELS = [
	"gemma-3-27b-pt",
	"Hulu-Med-7B",
	# Add model names that should not appear
	]
	```

	Output paths (usually don't need to change):
	- `ZERO_SHOT_OUTPUT` - Zero-Shot leaderboard path
	- `FEW_SHOT_OUTPUT` - Few-Shot leaderboard path
	- `COT_OUTPUT` - Chain-of-Thought leaderboard path

	## 📋 Complete Update Workflow

	1. Get your Excel file
	- Download/obtain the latest "Clinical Benchmark and LLM.xlsx"
	- Place it in the project root or note its location

	2. Update configuration
	```bash
	# Edit scripts/config.py
	# Set EXCEL_PATH to your file location
	# Add any models to INVALID_MODELS if needed
	```

	3. Run the generation script
	```bash
	python scripts/main.py
	```

	4. Verify the output
	- Check `leaderboards/Zero-Shot_leaderboard.json`
	- Check `leaderboards/Few-Shot_leaderboard.json`
	- Check `leaderboards/CoT_leaderboard.json`
	- Check `task_information.json`

	5. Test locally
	```bash
	python app.py
	# Open browser to test the leaderboard interface
	```

	6. Deploy
	- Commit and push to GitHub
	- Deploy to Hugging Face Spaces

	## 📁 Files Overview

	- `config.py` - Central configuration file ⚠️ EDIT THIS FILE
	- `main.py` - Main script that orchestrates leaderboard generation
	- `requirements.txt` - Python dependencies for the scripts
	- `README.md` - This file
	- `helpers/` - Helper modules for processing Excel data
	- `excel_processor.py` - Processes Excel files and creates leaderboards
	- `reorganize_indices.py` - Reorganizes model indices by size
	- `CONSTANTS.py` - Constants for data mapping (task names, domain mappings, etc.)
	- `leaderboards.py` - Placeholder for future leaderboard operations
	- `__init__.py` - Makes helpers a Python package

	## 🤝 Sharing This Code

	When sharing this code with others:
	1. They only need to update `scripts/config.py` with their Excel file path
	2. All other files will automatically use the configured paths
	3. No need to search through multiple files to update paths
	4. The script validates the Excel file exists before running

	## 🐛 Troubleshooting

	### "Excel file not found" error
	```
	❌ ERROR: Excel file not found!
	```
	Solution:
	- Check that `EXCEL_PATH` in `scripts/config.py` points to a valid file
	- Verify the file exists at that location
	- Use absolute paths if relative paths don't work

	### "Missing models" or unexpected output
	Solution:
	- Verify that model names in `INVALID_MODELS` match exactly (case-sensitive)
	- Check that the Excel file has the required sheets:
	- "Models (Simplified)" - contains model information
	- "B-CLF", "B-EXT", "B-GEN" - for Zero-Shot
	- "B-CLF-5shot", "B-EXT-5shot", "B-GEN-5shot" - for Few-Shot
	- "B-CLF-CoT", "B-EXT-CoT", "B-GEN-CoT" - for CoT
	- "Task-all" - for task information

	### Import errors
	```
	ModuleNotFoundError: No module named 'pandas'
	```
	Solution:
	- Install the required packages: `pip install -r scripts/requirements.txt`
	- Make sure you're using the correct Python environment

	### Running from wrong directory
	```
	ModuleNotFoundError: No module named 'helpers'
	```
	Solution:
	- Always run from the project root: `python scripts/main.py`
	- Not from inside the scripts directory: ❌ `cd scripts && python main.py`

	## 💡 Excel File Requirements

	Your Excel file must contain:

	### Required Sheets:
	1. Models (Simplified) - Model metadata
	- Columns: Name, Domain, License, Size (B)

	2. Task Sheets (for each leaderboard type):
	- Zero-Shot: B-CLF, B-EXT, B-GEN
	- Few-Shot: B-CLF-5shot, B-EXT-5shot, B-GEN-5shot
	- CoT: B-CLF-CoT, B-EXT-CoT, B-GEN-CoT

	3. Task-all - Task metadata
	- Columns: Task name, Language, Task Type, Clinical context, Data Access, etc.

	### Model Name Handling:
	The script automatically handles some model name variations:
	- `gpt-35-turbo-0125` → `gpt-35-turbo`
	- `gpt-4o-0806` → `gpt-4o`
	- `gemini-2.0-flash-001` → `gemini-2.0-flash`
	- And more (see `excel_processor.py` for full list)

	## 🎯 What the Script Does

	1. Validates the Excel file exists
	2. Loads model information from "Models (Simplified)" sheet
	3. Processes each leaderboard type (Zero-Shot, Few-Shot, CoT):
	- Extracts performance data from task sheets
	- Calculates average performance
	- Generates JSON with model info and scores
	4. Reorganizes model indices by size (smallest to largest)
	5. Updates rankings based on average performance
	6. Creates task_information.json with metadata
	7. Saves all output files to the `leaderboards/` directory

	## 📝 Notes

	- The script preserves model order by size within each leaderboard
	- Rankings (T column) are updated based on average performance
	- Invalid models are excluded before processing
	- All JSON files are formatted with 4-space indentation
	- The script uses UTF-8 encoding to support non-ASCII characters