Update README.md
Browse files
README.md
CHANGED
|
@@ -473,7 +473,7 @@ Try it out by running the following snippet.
|
|
| 473 |
> a FP8 triton kernel for fast accelerated matmuls
|
| 474 |
> (`w8a8_block_fp8_matmul_triton`) will be used
|
| 475 |
> without any degradation in accuracy. However, if you want to
|
| 476 |
-
> run your model in BF16 see (#transformers-bf16)
|
| 477 |
|
| 478 |
<details>
|
| 479 |
<summary>Python snippet</summary>
|
|
@@ -518,6 +518,8 @@ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
|
|
| 518 |
print(decoded_output)
|
| 519 |
```
|
| 520 |
|
|
|
|
|
|
|
| 521 |
#### Transformers BF16
|
| 522 |
|
| 523 |
Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
|
|
@@ -533,8 +535,6 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
|
|
| 533 |
)
|
| 534 |
```
|
| 535 |
|
| 536 |
-
</details>
|
| 537 |
-
|
| 538 |
## License
|
| 539 |
|
| 540 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|
|
|
|
| 473 |
> a FP8 triton kernel for fast accelerated matmuls
|
| 474 |
> (`w8a8_block_fp8_matmul_triton`) will be used
|
| 475 |
> without any degradation in accuracy. However, if you want to
|
| 476 |
+
> run your model in BF16 see ([here](#transformers-bf16))
|
| 477 |
|
| 478 |
<details>
|
| 479 |
<summary>Python snippet</summary>
|
|
|
|
| 518 |
print(decoded_output)
|
| 519 |
```
|
| 520 |
|
| 521 |
+
</details>
|
| 522 |
+
|
| 523 |
#### Transformers BF16
|
| 524 |
|
| 525 |
Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
|
|
|
|
| 535 |
)
|
| 536 |
```
|
| 537 |
|
|
|
|
|
|
|
| 538 |
## License
|
| 539 |
|
| 540 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|