BRIDGE-OPEN-Leaderboard

Sleeping

App Files Files Community

BRIDGE-OPEN-Leaderboard / docs.md

henrygu123

Update docs.md

07e9e97 verified 26 days ago

preview code

raw

history blame contribute delete

10 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

BRIDGE-OPEN Leaderboard

📢 Updates

🗓️ 2025/12/07: Updated leaderboard with 1 model (99 models in total)! View the full list of added models
🗓️ 2025/11/04: Updated leaderboard with 2 models (98 models in total)! View the full list of added models
🗓️ 2025/11/01: Updated leaderboard with 3 models (96 models in total)! View the full list of added models
🗓️ 2025/09/04: Updated leaderboard with 8 models (93 models in total)! View the full list of added models
🗓️ 2025/07/22: Updated leaderboard with 10 models (85 models in total)! View the full list of added models
🗓️ 2025/06/03: Updated leaderboard with 21 models (75 models in total)! View the full list of added models
🗓️ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live (54 models in total)! View the full list of models
🗓️ 2025/04/28: Our paper BRIDGE is now available on arXiv!

🎯 Purpose

BRIDGE-OPEN is a curated subset of the comprehensive BRIDGE Medical Leaderboard that focuses exclusively on open-source clinical datasets. The datasets are accessible via the BRIDGE-Open dataset on Hugging Face. While the full BRIDGE benchmark contains 87 clinical text tasks, BRIDGE-OPEN includes only those datasets that are freely accessible without restricted access requirements.

For more information about BRIDGE and the construction of this LLM benchmark, please visit the original BRIDGE Leaderboard Space.

This leaderboard enables researchers and practitioners to:

Evaluate LLMs on clinical tasks using publicly available data
Reproduce and verify results without data access barriers
Benchmark models fairly on the same open clinical datasets
Advance medical AI research through transparent evaluation

📊 What's Included

BRIDGE-OPEN contains 50+ open-access clinical datasets spanning:

9 Languages: English, Chinese, Spanish, Japanese, German, Russian, French, Norwegian, Portuguese
8 Task Types: Text classification, semantic similarity, normalization/coding, NER, NLI, event extraction, QA, summarization
14 Clinical Specialties: General medicine, cardiology, oncology, pharmacology, radiology, and more
6 Clinical Stages: From triage to discharge and administration

🏆 Three Evaluation Modes

Each model is evaluated using three different inference strategies:

Zero-Shot: Direct task completion without examples
Chain-of-Thought (CoT): Step-by-step reasoning before final answer
Few-Shot: 5 example demonstrations for in-context learning

🚀 How to Evaluate Your Model

Option 1: Run Inference Locally

Download the BRIDGE-Open dataset
Run inference on your model
Save predictions in the "pred" field for each sample
Submit results via Google Form

Option 2: Request Evaluation

Submit your model details via Google Form and we'll evaluate it for you.

Note: Due to computational constraints, there may be delays in processing submissions.

🔍 Key Differences from Full BRIDGE

Feature	BRIDGE (Full)	BRIDGE-OPEN
Datasets	87 tasks	50+ open-access tasks
Data Access	Mixed (open + regulated)	100% open access
Reproducibility	Limited by data access	Fully reproducible
Use Case	Comprehensive evaluation	Open research & development

🤝 Contributing

Have an open clinical dataset to add? Submit it through our Google Form!

📬 Contact

If you have any questions about BRIDGE or the leaderboard, feel free to contact us!

Leaderboard Managers:
- Jiageng Wu ([email protected])
- Kevin Xie ([email protected])
- Bowen Gu ([email protected])
Benchmark Managers: Jiageng Wu, Bowen Gu
Project Lead: Prof. Jie Yang ([email protected])

📚 Citation

If you find this leaderboard useful for your research and applications, please cite the following papers:

@article{BRIDGE-benchmark,
    title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
    author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
    year={2025},
    journal={arXiv preprint arXiv: 2504.19467},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2504.19467}
}

@article{clinical-text-review,
    title={Clinical text datasets for medical artificial intelligence and large language models—a systematic review},
    author={Wu, Jiageng and Liu, Xiaocong and Li, Minghui and Li, Wanxin and Su, Zichang and Lin, Shixu and Garay, Lucas and Zhang, Zhiyun and Zhang, Yujie and Zeng, Qingcheng and Shen, Jie and Yuan, Changzheng and Yang, Jie},
    journal={NEJM AI},
    volume={1},
    number={6},
    pages={AIra2400012},
    year={2024},
    publisher={Massachusetts Medical Society}
}

If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in our BRIDGE paper.

BRIDGE-OPEN is maintained by the Y-Lab team at Harvard Medical School and Brigham and Women's Hospital.