Small fix
Browse files- README.md +1 -1
- config.json +2 -1
README.md
CHANGED
|
@@ -118,7 +118,7 @@ Compared to previous versions of DeepSeek-R1, the usage recommendations for Deep
|
|
| 118 |
1. System prompt is supported now.
|
| 119 |
2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
|
| 120 |
|
| 121 |
-
The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B.
|
| 122 |
|
| 123 |
### System Prompt
|
| 124 |
In the official DeepSeek web/app, we use the same system prompt with a specific date.
|
|
|
|
| 118 |
1. System prompt is supported now.
|
| 119 |
2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
|
| 120 |
|
| 121 |
+
The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B, but it is essential to ensure that all configuration files are sourced from our repository rather than the original Qwen3 project.
|
| 122 |
|
| 123 |
### System Prompt
|
| 124 |
In the official DeepSeek web/app, we use the same system prompt with a specific date.
|
config.json
CHANGED
|
@@ -21,7 +21,8 @@
|
|
| 21 |
"rope_scaling": {
|
| 22 |
"rope_type": "yarn",
|
| 23 |
"factor": 4.0,
|
| 24 |
-
"original_max_position_embeddings": 32768
|
|
|
|
| 25 |
},
|
| 26 |
"rope_theta": 1000000,
|
| 27 |
"sliding_window": null,
|
|
|
|
| 21 |
"rope_scaling": {
|
| 22 |
"rope_type": "yarn",
|
| 23 |
"factor": 4.0,
|
| 24 |
+
"original_max_position_embeddings": 32768,
|
| 25 |
+
"attn_factor": 0.8782488562869419
|
| 26 |
},
|
| 27 |
"rope_theta": 1000000,
|
| 28 |
"sliding_window": null,
|