sergiopaniego HF Staff commited on
Commit
2bb8ca3
·
verified ·
1 Parent(s): 4b8dd8a

Updated `hidden_act` to `silu`

Browse files

Hi!

Sergio from the HF team here.
We 've detected that when serving the model via vLLM using `--config_format hf`, there are some discrepancies between the outputs when comparted to serving the model using `--config_format mistral`.
Investigating, we've found that when launching using `--config_format hf`, the model is a [`Mistral3ForConditionalGeneration`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/mistral3.py#L383C7-L383C39) instance, while when using `--config_format mistral` it's a [`PixtralForConditionalGeneration`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L293C7-L293C38).
In the second case, the original, there is a [`silu`](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L595) call here while for the `hf` version, it is defined [here](https://github.com/vllm-project/vllm/blob/a944f8ede7361a5233e112a575ff77c4aaa268a5/vllm/model_executor/models/pixtral.py#L977), so taken from this file.

The `gelu` must be changed to `silu`.

The issue can be seen here. This is a debug view for instantiating the model using `--config_format hf` (`gelu` in the right hand side of the image):
![gelu_issue.png](https://cdn-uploads.huggingface.co/production/uploads/61929226ded356549e20c5da/p9AmXSlSGp86pMDYuBZoP.png)

Modifying this file, we can get (`silu` in the right hand side of the image):
![silu_solution.png](https://cdn-uploads.huggingface.co/production/uploads/61929226ded356549e20c5da/gmmLnn1-K1AKhIVeWHoak.png)

This is currently causing problems when evaluating the model. For instance, using [`mistral-evals`](https://github.com/mistralai/mistral-evals) and ChartQA, for the `--config_format mistral` we obtain `0.8612` vs `0.818` using `--config_format hf`.
Changing this config file, we obtain ` 0.86`.

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -30,7 +30,7 @@
30
  "vision_config": {
31
  "attention_dropout": 0.0,
32
  "head_dim": 64,
33
- "hidden_act": "gelu",
34
  "hidden_size": 1024,
35
  "image_size": 1540,
36
  "initializer_range": 0.02,
 
30
  "vision_config": {
31
  "attention_dropout": 0.0,
32
  "head_dim": 64,
33
+ "hidden_act": "silu",
34
  "hidden_size": 1024,
35
  "image_size": 1540,
36
  "initializer_range": 0.02,