Improve metadata, use text-generation pipeline tag
Browse filesThis PR improves the model card, making sure that a useful code snippet is displayed in the top right. Besides this, it also adds a link to the Github repository.
README.md
CHANGED
|
@@ -1,14 +1,23 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- cerebras/SlimPajama-627B
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
-
|
| 8 |
-
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
|
|
|
| 10 |
# Random Baseline Language Model (7.2B Parameters, 150B Tokens)
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
## Model Description
|
| 13 |
|
| 14 |
This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
|
|
@@ -28,7 +37,7 @@ This is a 7.2B parameter transformer-based decoder-only language model trained f
|
|
| 28 |
- **Hidden Dimension**: 4,096
|
| 29 |
- **Number of Layers**: 32
|
| 30 |
- **Attention Heads**: 32
|
| 31 |
-
- **Key-Value Heads**: 8 (Grouped Query Attention)
|
| 32 |
- **MLP Ratio**: 8/3
|
| 33 |
- **Position Encoding**: RoPE (base=10,000)
|
| 34 |
|
|
@@ -183,4 +192,14 @@ Please refer to the license terms of the original SlimPajama dataset and follow
|
|
| 183 |
|
| 184 |
## Contact
|
| 185 |
|
| 186 |
-
For questions or issues, please contact the authors or open an issue in the repository.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- internlm/internlm2-7b
|
| 4 |
datasets:
|
| 5 |
- cerebras/SlimPajama-627B
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
+
license: mit
|
| 9 |
+
metrics:
|
| 10 |
+
- accuracy
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
library_name: transformers
|
| 13 |
---
|
| 14 |
+
|
| 15 |
# Random Baseline Language Model (7.2B Parameters, 150B Tokens)
|
| 16 |
|
| 17 |
+
This repository contains the 7.2B parameter random baseline language model used in the paper [Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models](https://huggingface.co/papers/2504.14194).
|
| 18 |
+
|
| 19 |
+
Code: https://github.com/opendatalab/Meta-rater
|
| 20 |
+
|
| 21 |
## Model Description
|
| 22 |
|
| 23 |
This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
|
|
|
|
| 37 |
- **Hidden Dimension**: 4,096
|
| 38 |
- **Number of Layers**: 32
|
| 39 |
- **Attention Heads**: 32
|
| 40 |
+
- **Key-Value Heads**: 8 (Grouped Query Attention)\
|
| 41 |
- **MLP Ratio**: 8/3
|
| 42 |
- **Position Encoding**: RoPE (base=10,000)
|
| 43 |
|
|
|
|
| 192 |
|
| 193 |
## Contact
|
| 194 |
|
| 195 |
+
For questions or issues, please contact the authors or open an issue in the repository.
|
| 196 |
+
|
| 197 |
+
---
|
| 198 |
+
|
| 199 |
+
<div align="center">
|
| 200 |
+
|
| 201 |
+
**⭐ Star us on GitHub if you find Meta-rater useful! ⭐**
|
| 202 |
+
|
| 203 |
+
Made with ❤️ by the OpenDataLab team
|
| 204 |
+
|
| 205 |
+
</div>
|