Update README.md
Browse files
README.md
CHANGED
|
@@ -30,7 +30,7 @@ tags:
|
|
| 30 |
|
| 31 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 1000px; max-width: 100%;" />
|
| 32 |
|
| 33 |
-
We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B) (Release Date: June 16, 2025), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B). We initiate RL training from various SFT models and find that stronger SFT models continue to produce consistently better results after large-scale RL, although the performance gap narrows during RL training. Thanks to its stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models on challenging math and code reasoning benchmarks. For more details, check our [technical report](https://arxiv.org/abs/2506.13284).
|
| 34 |
|
| 35 |
## Results
|
| 36 |
|
|
|
|
| 30 |
|
| 31 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 1000px; max-width: 100%;" />
|
| 32 |
|
| 33 |
+
We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B) (**Release Date: June 16, 2025**), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B). We initiate RL training from various SFT models and find that stronger SFT models continue to produce consistently better results after large-scale RL, although the performance gap narrows during RL training. Thanks to its stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models on challenging math and code reasoning benchmarks. For more details, check our [technical report](https://arxiv.org/abs/2506.13284).
|
| 34 |
|
| 35 |
## Results
|
| 36 |
|