fix broken link
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ Since our initial [Shisa 7B](https://huggingface.co/augmxnt/shisa-7b-v1) release
|
|
| 24 |

|
| 25 |
|
| 26 |
## Shisa V2 405B
|
| 27 |
-
**Llama 3.1 Shisa V2 405B**<sup>1</sup> is a slightly special version of Shisa V2. Obviously, it is the largest, using [Llama 3.1 405B Instruct](meta-llama/Llama-3.1-405B-Instruct) as the base model and required >50x the compute for SFT+DPO compared to the 70B version. While it uses the same Japanese data mix as the other Shisa V2 models, it also has some contributed KO and ZH-TW language data mixed in as well.
|
| 28 |
|
| 29 |
Most notably, Shisa V2 405B not only outperforms Shisa V2 70B on our battery of evals, but also GPT-4 (0603) and GPT-4 Turbo (2024-04-09). Shisa V2 405B also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench. Based on the evaluation results, we believe that Shisa V2 405B is the highest performing LLM ever trained in Japan.
|
| 30 |
|
|
|
|
| 24 |

|
| 25 |
|
| 26 |
## Shisa V2 405B
|
| 27 |
+
**Llama 3.1 Shisa V2 405B**<sup>1</sup> is a slightly special version of Shisa V2. Obviously, it is the largest, using [Llama 3.1 405B Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) as the base model and required >50x the compute for SFT+DPO compared to the 70B version. While it uses the same Japanese data mix as the other Shisa V2 models, it also has some contributed KO and ZH-TW language data mixed in as well.
|
| 28 |
|
| 29 |
Most notably, Shisa V2 405B not only outperforms Shisa V2 70B on our battery of evals, but also GPT-4 (0603) and GPT-4 Turbo (2024-04-09). Shisa V2 405B also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench. Based on the evaluation results, we believe that Shisa V2 405B is the highest performing LLM ever trained in Japan.
|
| 30 |
|