henrygu123 commited on
Commit
1b46d2b
·
verified ·
1 Parent(s): 9e3c39d

Update docs.md

Browse files
Files changed (1) hide show
  1. docs.md +3 -3
docs.md CHANGED
@@ -92,8 +92,8 @@ This project is led and maintained by the team of <a href="https://ylab.top/">Pr
92
  <li><strong>Advanced LLMs (73 models)</strong>:
93
  <ul>
94
  <li><strong>Proprietary models</strong>: GPT-4o, GPT-3.5, Gemini-2.0-Flash, Gemini-1.5-Pro ...</li>
95
- <li><strong>Open-source models</strong>: Llama 3/4, QWEN2.5, Mistral, Gemma ...</li>
96
- <li><strong>Medical models</strong>: Baichuan-M1-14B, meditron, MeLLaMA... </li>
97
  <li><strong>Reasoning models</strong>: Deepseek-R1(671B), QWQ-32B, Deepseek-R1-Distll-Qwen/Llama ...</li>
98
  </ul>
99
  </li>
@@ -132,7 +132,7 @@ Importantly, all 87 datasets have been verified to be either fully open-access o
132
  <p>This section provides important notes and clarifications related to specific models, evaluation configurations, and metadata on the leaderboard.</p>
133
 
134
  <h4>🧠 Qwen3 Thinking Mode</h4>
135
- <p>Some of the newly added Qwen3 models contain the suffixes <code>-Thinking</code> and <code>-Non-Thinking</code>, which refer to their internal configuration for reasoning behavior:</p>
136
  <ul>
137
  <li><strong><code>-Thinking</code></strong>: Model was evaluated with <code>enable_thinking = True</code></li>
138
  <li><strong><code>-Non-Thinking</code></strong>: Model was evaluated with <code>enable_thinking = False</code></li>
 
92
  <li><strong>Advanced LLMs (73 models)</strong>:
93
  <ul>
94
  <li><strong>Proprietary models</strong>: GPT-4o, GPT-3.5, Gemini-2.0-Flash, Gemini-1.5-Pro ...</li>
95
+ <li><strong>Open-source models</strong>: Qwen 3/2.5, Llama 3/4, Mistral, Gemma ...</li>
96
+ <li><strong>Medical models</strong>: medgemma, Baichuan-M1-14B, meditron, MeLLaMA... </li>
97
  <li><strong>Reasoning models</strong>: Deepseek-R1(671B), QWQ-32B, Deepseek-R1-Distll-Qwen/Llama ...</li>
98
  </ul>
99
  </li>
 
132
  <p>This section provides important notes and clarifications related to specific models, evaluation configurations, and metadata on the leaderboard.</p>
133
 
134
  <h4>🧠 Qwen3 Thinking Mode</h4>
135
+ <p>The evaluations for each Qwen3 model contains either the suffixes <code>-Thinking</code> and <code>-Non-Thinking</code>, which refer to their internal configuration for reasoning behavior:</p>
136
  <ul>
137
  <li><strong><code>-Thinking</code></strong>: Model was evaluated with <code>enable_thinking = True</code></li>
138
  <li><strong><code>-Non-Thinking</code></strong>: Model was evaluated with <code>enable_thinking = False</code></li>