Text Generation
Transformers
Safetensors
English
internlm2
custom_code
nielsr HF Staff commited on
Commit
caa8b6c
·
verified ·
1 Parent(s): de8792c

Improve metadata, use text-generation pipeline tag

Browse files

This PR improves the model card, making sure that a useful code snippet is displayed in the top right. Besides this, it also adds a link to the Github repository.

Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -1,14 +1,23 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - cerebras/SlimPajama-627B
5
  language:
6
  - en
7
- base_model:
8
- - internlm/internlm2-7b
 
 
 
9
  ---
 
10
  # Random Baseline Language Model (7.2B Parameters, 150B Tokens)
11
 
 
 
 
 
12
  ## Model Description
13
 
14
  This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
@@ -28,7 +37,7 @@ This is a 7.2B parameter transformer-based decoder-only language model trained f
28
  - **Hidden Dimension**: 4,096
29
  - **Number of Layers**: 32
30
  - **Attention Heads**: 32
31
- - **Key-Value Heads**: 8 (Grouped Query Attention)
32
  - **MLP Ratio**: 8/3
33
  - **Position Encoding**: RoPE (base=10,000)
34
 
@@ -183,4 +192,14 @@ Please refer to the license terms of the original SlimPajama dataset and follow
183
 
184
  ## Contact
185
 
186
- For questions or issues, please contact the authors or open an issue in the repository.
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - internlm/internlm2-7b
4
  datasets:
5
  - cerebras/SlimPajama-627B
6
  language:
7
  - en
8
+ license: mit
9
+ metrics:
10
+ - accuracy
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
  ---
14
+
15
  # Random Baseline Language Model (7.2B Parameters, 150B Tokens)
16
 
17
+ This repository contains the 7.2B parameter random baseline language model used in the paper [Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models](https://huggingface.co/papers/2504.14194).
18
+
19
+ Code: https://github.com/opendatalab/Meta-rater
20
+
21
  ## Model Description
22
 
23
  This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
 
37
  - **Hidden Dimension**: 4,096
38
  - **Number of Layers**: 32
39
  - **Attention Heads**: 32
40
+ - **Key-Value Heads**: 8 (Grouped Query Attention)\
41
  - **MLP Ratio**: 8/3
42
  - **Position Encoding**: RoPE (base=10,000)
43
 
 
192
 
193
  ## Contact
194
 
195
+ For questions or issues, please contact the authors or open an issue in the repository.
196
+
197
+ ---
198
+
199
+ <div align="center">
200
+
201
+ **⭐ Star us on GitHub if you find Meta-rater useful! ⭐**
202
+
203
+ Made with ❤️ by the OpenDataLab team
204
+
205
+ </div>