patrickvonplaten commited on
Commit
1bca1cb
·
verified ·
1 Parent(s): 7e1b52c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -454,7 +454,7 @@ print(assistant_message)
454
 
455
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
456
 
457
- Transformers very recently added preliminary support for FP8, so please make sure to install from main:
458
 
459
  ```sh
460
  uv pip install git+https://github.com/huggingface/transformers
@@ -469,10 +469,11 @@ pip install mistral-common --upgrade
469
  Try it out by running the following snippet.
470
 
471
  > [!Tip]
472
- > By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,
473
- > which means the model currently does not make use of accelerated FP8-kernels.
474
- > Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.
475
- > Stay tuned!
 
476
 
477
  <details>
478
  <summary>Python snippet</summary>
@@ -517,7 +518,7 @@ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
517
  print(decoded_output)
518
  ```
519
 
520
- **Note:**
521
 
522
  Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
523
 
 
454
 
455
  You can also use Ministral 3 3B Instruct 2512 with `Transformers` !
456
 
457
+ Transformers recently added support for FP8, so make sure to install from main:
458
 
459
  ```sh
460
  uv pip install git+https://github.com/huggingface/transformers
 
469
  Try it out by running the following snippet.
470
 
471
  > [!Tip]
472
+ > On latest main as of 05/12/2025, by default
473
+ > a FP8 triton kernel for fast accelerated matmuls
474
+ > (`w8a8_block_fp8_matmul_triton`) will be used
475
+ > without any degradation in accuracy. However, if you want to
476
+ > run your model in BF16 see (#transformers-bf16)
477
 
478
  <details>
479
  <summary>Python snippet</summary>
 
518
  print(decoded_output)
519
  ```
520
 
521
+ #### Transformers BF16
522
 
523
  Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
524