Updated `hidden_act` to `silu`
Browse filesHi!
Sergio from the HF team here.
We 've detected that when serving the model via vLLM using `--config_format hf`, there are some discrepancies between the outputs when comparted to serving the model using `--config_format mistral`.
Investigating, we've found that when launching using `--config_format hf`, the model is a [`Mistral3ForConditionalGeneration`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/mistral3.py#L383C7-L383C39) instance, while when using `--config_format mistral` it's a [`PixtralForConditionalGeneration`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L293C7-L293C38).
In the second case, the original, there is a [`silu`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L595) call here while for the `hf` version, it is defined [here](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L977), so taken from this file.
The `gelu` must be changed to `silu`.
The issue can be seen here. This is a debug view for instantiating the model using `--config_format hf` (`gelu` in the right hand side of the image):

Modifying this file, we can get (`silu` in the right hand side of the image):

This is currently causing problems when evaluating the model. For instance, using [`mistral-evals`](https://github.com/mistralai/mistral-evals) and ChartQA, for the `--config_format mistral` we obtain `0.8612` vs `0.818` using `--config_format hf`.
Changing this config file, we obtain ` 0.86`.
- config.json +1 -1
|
@@ -30,7 +30,7 @@
|
|
| 30 |
"vision_config": {
|
| 31 |
"attention_dropout": 0.0,
|
| 32 |
"head_dim": 64,
|
| 33 |
-
"hidden_act": "
|
| 34 |
"hidden_size": 1024,
|
| 35 |
"image_size": 1540,
|
| 36 |
"initializer_range": 0.02,
|
|
|
|
| 30 |
"vision_config": {
|
| 31 |
"attention_dropout": 0.0,
|
| 32 |
"head_dim": 64,
|
| 33 |
+
"hidden_act": "silu",
|
| 34 |
"hidden_size": 1024,
|
| 35 |
"image_size": 1540,
|
| 36 |
"initializer_range": 0.02,
|