Spaces:
Sleeping
Sleeping
| # π Leaderboard Generation Scripts | |
| This directory contains scripts to automatically generate leaderboard JSON files from Excel data. | |
| ## β‘ Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r scripts/requirements.txt | |
| ``` | |
| Required packages: | |
| - `pandas>=2.0.0` - For reading Excel files and data manipulation | |
| - `openpyxl>=3.1.0` - For reading .xlsx Excel files | |
| ### 2. Configure Excel Path | |
| Open `scripts/config.py` and update the `EXCEL_PATH` variable: | |
| ```python | |
| EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx" # Place in project root | |
| # OR | |
| EXCEL_PATH = Path("/Users/yourname/Desktop/benchmark.xlsx") # Use absolute path | |
| ``` | |
| ### 3. Run the Script | |
| From the **project root** directory: | |
| ```bash | |
| python scripts/main.py | |
| ``` | |
| Or with Python 3 explicitly: | |
| ```bash | |
| python3 scripts/main.py | |
| ``` | |
| That's it! The script will automatically: | |
| - β Read the Excel file | |
| - β Generate all three leaderboards (Zero-Shot, Few-Shot, CoT) | |
| - β Update task information | |
| - β Calculate and update rankings | |
| - β Save everything to the `leaderboards/` directory | |
| ## βοΈ Configuration | |
| All settings are in `scripts/config.py`: | |
| ### Required Configuration | |
| **`EXCEL_PATH`** - Path to your Excel file containing model and task data | |
| ```python | |
| # In project root (recommended) | |
| EXCEL_PATH = BASE_DIR / "Clinical Benchmark and LLM.xlsx" | |
| # Absolute path | |
| EXCEL_PATH = Path("/Users/yourname/Desktop/Clinical Benchmark and LLM.xlsx") | |
| # In a subdirectory | |
| EXCEL_PATH = BASE_DIR / "data" / "benchmark.xlsx" | |
| ``` | |
| ### Optional Configuration | |
| **`INVALID_MODELS`** - Models to exclude from leaderboards | |
| ```python | |
| INVALID_MODELS = [ | |
| "gemma-3-27b-pt", | |
| "Hulu-Med-7B", | |
| # Add model names that should not appear | |
| ] | |
| ``` | |
| **Output paths** (usually don't need to change): | |
| - `ZERO_SHOT_OUTPUT` - Zero-Shot leaderboard path | |
| - `FEW_SHOT_OUTPUT` - Few-Shot leaderboard path | |
| - `COT_OUTPUT` - Chain-of-Thought leaderboard path | |
| ## π Complete Update Workflow | |
| 1. **Get your Excel file** | |
| - Download/obtain the latest "Clinical Benchmark and LLM.xlsx" | |
| - Place it in the project root or note its location | |
| 2. **Update configuration** | |
| ```bash | |
| # Edit scripts/config.py | |
| # Set EXCEL_PATH to your file location | |
| # Add any models to INVALID_MODELS if needed | |
| ``` | |
| 3. **Run the generation script** | |
| ```bash | |
| python scripts/main.py | |
| ``` | |
| 4. **Verify the output** | |
| - Check `leaderboards/Zero-Shot_leaderboard.json` | |
| - Check `leaderboards/Few-Shot_leaderboard.json` | |
| - Check `leaderboards/CoT_leaderboard.json` | |
| - Check `task_information.json` | |
| 5. **Test locally** | |
| ```bash | |
| python app.py | |
| # Open browser to test the leaderboard interface | |
| ``` | |
| 6. **Deploy** | |
| - Commit and push to GitHub | |
| - Deploy to Hugging Face Spaces | |
| ## π Files Overview | |
| - **`config.py`** - Central configuration file β οΈ **EDIT THIS FILE** | |
| - **`main.py`** - Main script that orchestrates leaderboard generation | |
| - **`requirements.txt`** - Python dependencies for the scripts | |
| - **`README.md`** - This file | |
| - **`helpers/`** - Helper modules for processing Excel data | |
| - `excel_processor.py` - Processes Excel files and creates leaderboards | |
| - `reorganize_indices.py` - Reorganizes model indices by size | |
| - `CONSTANTS.py` - Constants for data mapping (task names, domain mappings, etc.) | |
| - `leaderboards.py` - Placeholder for future leaderboard operations | |
| - `__init__.py` - Makes helpers a Python package | |
| ## π€ Sharing This Code | |
| When sharing this code with others: | |
| 1. They only need to update `scripts/config.py` with their Excel file path | |
| 2. All other files will automatically use the configured paths | |
| 3. No need to search through multiple files to update paths | |
| 4. The script validates the Excel file exists before running | |
| ## π Troubleshooting | |
| ### "Excel file not found" error | |
| ``` | |
| β ERROR: Excel file not found! | |
| ``` | |
| **Solution**: | |
| - Check that `EXCEL_PATH` in `scripts/config.py` points to a valid file | |
| - Verify the file exists at that location | |
| - Use absolute paths if relative paths don't work | |
| ### "Missing models" or unexpected output | |
| **Solution**: | |
| - Verify that model names in `INVALID_MODELS` match exactly (case-sensitive) | |
| - Check that the Excel file has the required sheets: | |
| - "Models (Simplified)" - contains model information | |
| - "B-CLF", "B-EXT", "B-GEN" - for Zero-Shot | |
| - "B-CLF-5shot", "B-EXT-5shot", "B-GEN-5shot" - for Few-Shot | |
| - "B-CLF-CoT", "B-EXT-CoT", "B-GEN-CoT" - for CoT | |
| - "Task-all" - for task information | |
| ### Import errors | |
| ``` | |
| ModuleNotFoundError: No module named 'pandas' | |
| ``` | |
| **Solution**: | |
| - Install the required packages: `pip install -r scripts/requirements.txt` | |
| - Make sure you're using the correct Python environment | |
| ### Running from wrong directory | |
| ``` | |
| ModuleNotFoundError: No module named 'helpers' | |
| ``` | |
| **Solution**: | |
| - Always run from the **project root**: `python scripts/main.py` | |
| - Not from inside the scripts directory: β `cd scripts && python main.py` | |
| ## π‘ Excel File Requirements | |
| Your Excel file must contain: | |
| ### Required Sheets: | |
| 1. **Models (Simplified)** - Model metadata | |
| - Columns: Name, Domain, License, Size (B) | |
| 2. **Task Sheets** (for each leaderboard type): | |
| - Zero-Shot: B-CLF, B-EXT, B-GEN | |
| - Few-Shot: B-CLF-5shot, B-EXT-5shot, B-GEN-5shot | |
| - CoT: B-CLF-CoT, B-EXT-CoT, B-GEN-CoT | |
| 3. **Task-all** - Task metadata | |
| - Columns: Task name, Language, Task Type, Clinical context, Data Access, etc. | |
| ### Model Name Handling: | |
| The script automatically handles some model name variations: | |
| - `gpt-35-turbo-0125` β `gpt-35-turbo` | |
| - `gpt-4o-0806` β `gpt-4o` | |
| - `gemini-2.0-flash-001` β `gemini-2.0-flash` | |
| - And more (see `excel_processor.py` for full list) | |
| ## π― What the Script Does | |
| 1. **Validates** the Excel file exists | |
| 2. **Loads** model information from "Models (Simplified)" sheet | |
| 3. **Processes** each leaderboard type (Zero-Shot, Few-Shot, CoT): | |
| - Extracts performance data from task sheets | |
| - Calculates average performance | |
| - Generates JSON with model info and scores | |
| 4. **Reorganizes** model indices by size (smallest to largest) | |
| 5. **Updates** rankings based on average performance | |
| 6. **Creates** task_information.json with metadata | |
| 7. **Saves** all output files to the `leaderboards/` directory | |
| ## π Notes | |
| - The script preserves model order by size within each leaderboard | |
| - Rankings (T column) are updated based on average performance | |
| - Invalid models are excluded before processing | |
| - All JSON files are formatted with 4-space indentation | |
| - The script uses UTF-8 encoding to support non-ASCII characters | |