Spaces:
Sleeping
Sleeping
Update docs.md
Browse files
docs.md
CHANGED
|
@@ -1,41 +1,71 @@
|
|
| 1 |
-
<
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
</div>
|
| 24 |
|
|
|
|
| 25 |
<h2>π Background</h2>
|
| 26 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
| 27 |
This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
|
| 28 |
|
| 29 |
-
|
| 30 |
-
<div
|
| 31 |
-
|
| 32 |
-
src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png"
|
| 33 |
-
alt="dataset"
|
| 34 |
-
style="max-width: 80%; max-height: 100%; object-fit: contain;"
|
| 35 |
-
/>
|
| 36 |
</div>
|
| 37 |
|
| 38 |
-
|
| 39 |
<h2>π BRIDGE Leaderboard</h2>
|
| 40 |
<p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
|
| 41 |
<ul>
|
|
@@ -45,15 +75,12 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
|
|
| 45 |
</ul>
|
| 46 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
| 47 |
|
| 48 |
-
<
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
alt="HMS"
|
| 52 |
-
style="max-width: 90%; max-height: 100%; object-fit: contain;"
|
| 53 |
-
/>
|
| 54 |
</div>
|
| 55 |
|
| 56 |
-
|
| 57 |
<h2>π Key Features</h2>
|
| 58 |
<ul>
|
| 59 |
<li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
|
|
@@ -71,6 +98,7 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
|
|
| 71 |
</ul>
|
| 72 |
More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
|
| 73 |
|
|
|
|
| 74 |
<h2>π οΈ How to Evaluate Your Model on BRIDGE ?</h2>
|
| 75 |
<h4>π Dataset Access</h4>
|
| 76 |
<p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
|
|
@@ -88,33 +116,36 @@ Importantly, all 87 datasets have been verified to be either fully open-access o
|
|
| 88 |
</ul>
|
| 89 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
| 90 |
|
|
|
|
| 91 |
<h2>π’ Updates</h2>
|
| 92 |
<ul>
|
| 93 |
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
| 94 |
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
| 95 |
</ul>
|
| 96 |
|
|
|
|
| 97 |
<h2>π€ Contributing</h2>
|
| 98 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
| 99 |
If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
|
| 100 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
| 101 |
|
|
|
|
| 102 |
<h2>π Donation</h2>
|
| 103 |
<p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:jyang66@bwh.harvard.edu">jyang66@bwh.harvard.edu</a> to discuss donation opportunities.</p>
|
| 104 |
|
|
|
|
| 105 |
<h2>π¬ Contact Information</h2>
|
| 106 |
-
<p>If you have any questions about BRIDGE or the leaderboard, feel free to
|
| 107 |
<ul>
|
| 108 |
<li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:jiwu7@bwh.harvard.edu">jiwu7@bwh.harvard.edu</a>), Kevin Xie (<a href="mailto:kevinxie@mit.edu">kevinxie@mit.edu</a>), Bowen Gu (<a href="mailto:bogu@bwh.harvard.edu">bogu@bwh.harvard.edu</a>)</li>
|
| 109 |
<li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
|
| 110 |
<li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:jyang66@bwh.harvard.edu">jyang66@bwh.harvard.edu</a>)</li>
|
| 111 |
</ul>
|
| 112 |
-
</div>
|
| 113 |
|
|
|
|
| 114 |
<h2>π Citation</h2>
|
| 115 |
<p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
|
| 116 |
-
<pre style="white-space: pre-wrap; overflow-wrap: anywhere;">
|
| 117 |
-
<code>@article{BRIDGE-benchmark,
|
| 118 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
| 119 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
| 120 |
year={2025},
|
|
@@ -132,6 +163,8 @@ If you have clinical text datasets that you would like to share for broader expl
|
|
| 132 |
pages={AIra2400012},
|
| 133 |
year={2024},
|
| 134 |
publisher={Massachusetts Medical Society}
|
| 135 |
-
}
|
| 136 |
-
<
|
| 137 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
<!-- ---------- Global Styles ---------- -->
|
| 2 |
+
<style>
|
| 3 |
+
/* 1. Center content and limit max width for readability */
|
| 4 |
+
.wrapper{
|
| 5 |
+
max-width:880px; /* change here if you prefer wider/narrower */
|
| 6 |
+
margin:0 auto;
|
| 7 |
+
padding:0 1rem;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
/* 2. Logo bar (top row) */
|
| 11 |
+
.logo-bar{
|
| 12 |
+
display:flex;
|
| 13 |
+
align-items:center;
|
| 14 |
+
justify-content:space-between;
|
| 15 |
+
height:50px;
|
| 16 |
+
margin-bottom:25px;
|
| 17 |
+
}
|
| 18 |
+
.logo-bar img{
|
| 19 |
+
height:100%;
|
| 20 |
+
max-width:100%;
|
| 21 |
+
object-fit:contain;
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
/* 3. Generic paragraph spacing */
|
| 25 |
+
p{line-height:1.6;}
|
| 26 |
+
|
| 27 |
+
/* 4. Re-usable image section */
|
| 28 |
+
.section-img{
|
| 29 |
+
display:flex;
|
| 30 |
+
justify-content:center;
|
| 31 |
+
align-items:center;
|
| 32 |
+
margin:25px 0; /* vertical breathing room */
|
| 33 |
+
}
|
| 34 |
+
.section-img img{
|
| 35 |
+
max-width:90%;
|
| 36 |
+
height:auto;
|
| 37 |
+
object-fit:contain; /* avoid distortion */
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
/* 5. Make long BibTeX lines wrap instead of widening page */
|
| 41 |
+
pre code{
|
| 42 |
+
white-space:pre-wrap;
|
| 43 |
+
word-break:break-word;
|
| 44 |
+
}
|
| 45 |
+
</style>
|
| 46 |
+
|
| 47 |
+
<!-- ---------- Page Content ---------- -->
|
| 48 |
+
<div class="wrapper">
|
| 49 |
+
|
| 50 |
+
<!-- Top logos ------------------------------------------------------------>
|
| 51 |
+
<div class="logo-bar">
|
| 52 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/1bNk6xHD90mlVaUOJ3kT6.png" alt="HMS" />
|
| 53 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/ZVx7ahuV1mVuIeygYwirc.png" alt="MGB" />
|
| 54 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/TkKKjmq98Wv_p5shxJTMY.png" alt="Broad" />
|
| 55 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/UcM8kmTaVkAM1qf3v09K8.png" alt="YLab" />
|
| 56 |
</div>
|
| 57 |
|
| 58 |
+
<!-- Background ----------------------------------------------------------->
|
| 59 |
<h2>π Background</h2>
|
| 60 |
<p>Recent advances in <strong>Large Language Models (LLMs)</strong> have demonstrated transformative potential in <strong>healthcare</strong>, yet concerns remain around their reliability and clinical validity across diverse clinical tasks, specialties, and languages. To support timely and trustworthy evaluation, building upon our <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a> of global clinical text resources, we introduce <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a>, <strong>a multilingual benchmark that comprises 87 real-world clinical text tasks spanning nine languages and more than one million samples</strong>. Furthermore, we construct this leaderboard of LLM in clinical text understanding by systematically evaluating <strong>52 state-of-the-art LLMs</strong> (by 2025/04/28).</p>
|
| 61 |
This project is led and maintained by the team of <a href="https://ylab.top/">Prof. Jie Yang</a> and <a href="https://www.drugepi.org/team/joshua-kueiyu-lin">Prof. Kueiyu Joshua Lin</a> at Harvard Medical School and Brigham and Women's Hospital.
|
| 62 |
|
| 63 |
+
<!-- Dataset illustration ------------------------------------------------->
|
| 64 |
+
<div class="section-img">
|
| 65 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/633c70c4ccce04161f841c30/OLN3J8_Yq8dx_LrgjYSsC.png" alt="dataset" />
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
</div>
|
| 67 |
|
| 68 |
+
<!-- Leaderboard description --------------------------------------------->
|
| 69 |
<h2>π BRIDGE Leaderboard</h2>
|
| 70 |
<p>BRIDGE features three leaderboards, each evaluating LLM performance in clinical text tasks under a distinct inference strategy:</p>
|
| 71 |
<ul>
|
|
|
|
| 75 |
</ul>
|
| 76 |
<p>In addition, BRIDGE offers multiple <strong>model filters</strong> and <strong>task filters</strong> to enable users to explore LLM performance across <strong>different clinical contexts</strong>, empowering researchers and clinicians to make informed decisions and track model advancements over time.</p>
|
| 77 |
|
| 78 |
+
<!-- Leaderboard illustration -------------------------------------------->
|
| 79 |
+
<div class="section-img">
|
| 80 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/67a040fb6934f9aa1c866f99/xpyabfXWqacZD-ThQ5guU.jpeg" alt="model" />
|
|
|
|
|
|
|
|
|
|
| 81 |
</div>
|
| 82 |
|
| 83 |
+
<!-- Key Features --------------------------------------------------------->
|
| 84 |
<h2>π Key Features</h2>
|
| 85 |
<ul>
|
| 86 |
<li><strong>Real-world Clinical Text</strong>: All tasks are sourced from real-world medical settings, such as electronic health records (EHRs), clinical case reports, or healthcare consultations</li>
|
|
|
|
| 98 |
</ul>
|
| 99 |
More Details can be found in our <a href="https://arxiv.org/abs/2504.19467">BRIDGE paper</a> and <a href="https://ai.nejm.org/doi/full/10.1056/AIra2400012">systematic review</a>.
|
| 100 |
|
| 101 |
+
<!-- Dataset access / submission ----------------------------------------->
|
| 102 |
<h2>π οΈ How to Evaluate Your Model on BRIDGE ?</h2>
|
| 103 |
<h4>π Dataset Access</h4>
|
| 104 |
<p>All fully open-access datasets in BRIDGE are available in <a href="https://huggingface.co/datasets/YLab-Open/BRIDGE-Open">BRIDGE-Open</a>. To ensure the fairness of this leaderboard, we publicly release the following data for each task:
|
|
|
|
| 116 |
</ul>
|
| 117 |
We will review and evaluate your submission and update the leaderboard accordingly.
|
| 118 |
|
| 119 |
+
<!-- Updates -------------------------------------------------------------->
|
| 120 |
<h2>π’ Updates</h2>
|
| 121 |
<ul>
|
| 122 |
<li>ποΈ 2025/04/28: BRIDGE Leaderboard V1.0.0 is now live!</li>
|
| 123 |
<li>ποΈ 2025/04/28: Our paper <a href="https://arxiv.org/abs/2504.19467">BRIDGE</a> is now available on arXiv!</li>
|
| 124 |
</ul>
|
| 125 |
|
| 126 |
+
<!-- Contributing --------------------------------------------------------->
|
| 127 |
<h2>π€ Contributing</h2>
|
| 128 |
<p>We welcome and greatly value contributions and collaborations from the community!
|
| 129 |
If you have clinical text datasets that you would like to share for broader exploration, please contact us!</p>
|
| 130 |
<p>We are committed to expanding BRIDGE while strictly adhering to appropriate data use agreements and ethical guidelines. Let's work together to advance the responsible application of LLMs in medicine!</p>
|
| 131 |
|
| 132 |
+
<!-- Donation ------------------------------------------------------------->
|
| 133 |
<h2>π Donation</h2>
|
| 134 |
<p>BRIDGE is a non-profit, researcher-led benchmark that requires substantial resources (e.g., high-performance GPUs, a dedicated team) to sustain. To support open and impactful academic research that advances clinical care, we welcome your contributions. Please contact Prof. Jie Yang at <a href="mailto:jyang66@bwh.harvard.edu">jyang66@bwh.harvard.edu</a> to discuss donation opportunities.</p>
|
| 135 |
|
| 136 |
+
<!-- Contact -------------------------------------------------------------->
|
| 137 |
<h2>π¬ Contact Information</h2>
|
| 138 |
+
<p>If you have any questions about BRIDGE or the leaderboard, feel free to contact us!</p>
|
| 139 |
<ul>
|
| 140 |
<li><strong>Leaderboard Managers</strong>: Jiageng Wu (<a href="mailto:jiwu7@bwh.harvard.edu">jiwu7@bwh.harvard.edu</a>), Kevin Xie (<a href="mailto:kevinxie@mit.edu">kevinxie@mit.edu</a>), Bowen Gu (<a href="mailto:bogu@bwh.harvard.edu">bogu@bwh.harvard.edu</a>)</li>
|
| 141 |
<li><strong>Benchmark Managers</strong>: Jiageng Wu, Bowen Gu</li>
|
| 142 |
<li><strong>Project Lead</strong>: Jie Yang (<a href="mailto:jyang66@bwh.harvard.edu">jyang66@bwh.harvard.edu</a>)</li>
|
| 143 |
</ul>
|
|
|
|
| 144 |
|
| 145 |
+
<!-- Citation ------------------------------------------------------------->
|
| 146 |
<h2>π Citation</h2>
|
| 147 |
<p>If you find this leaderboard useful for your research and applications, please cite the following papers:</p>
|
| 148 |
+
<pre style="white-space: pre-wrap; overflow-wrap: anywhere;"><code>@article{BRIDGE-benchmark,
|
|
|
|
| 149 |
title={BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text},
|
| 150 |
author={Wu, Jiageng and Gu, Bowen and Zhou, Ren and Xie, Kevin and Snyder, Doug and Jiang, Yixing and Carducci, Valentina and Wyss, Richard and Desai, Rishi J and Alsentzer, Emily and Celi, Leo Anthony and Rodman, Adam and Schneeweiss, Sebastian and Chen, Jonathan H. and Romero-Brufau, Santiago and Lin, Kueiyu Joshua and Yang, Jie},
|
| 151 |
year={2025},
|
|
|
|
| 163 |
pages={AIra2400012},
|
| 164 |
year={2024},
|
| 165 |
publisher={Massachusetts Medical Society}
|
| 166 |
+
}</code></pre>
|
| 167 |
+
<p>If you use the datasets in BRIDGE, please also cite the original paper of datasets, which can be found in our BRIDGE paper.</p>
|
| 168 |
+
|
| 169 |
+
</div>
|
| 170 |
+
<!-- ---------- End of Page Content ---------- -->
|