Improve metadata, use text-generation pipeline tag

This PR improves the model card, making sure that a useful code snippet is displayed in the top right. Besides this, it also adds a link to the Github repository.

Files changed (1) hide show

README.md +24 -5

README.md CHANGED Viewed

@@ -1,14 +1,23 @@
 ---
-license: mit
 datasets:
 - cerebras/SlimPajama-627B
 language:
 - en
-base_model:
-- internlm/internlm2-7b
 ---
 # Random Baseline Language Model (7.2B Parameters, 150B Tokens)
 ## Model Description
 This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
@@ -28,7 +37,7 @@ This is a 7.2B parameter transformer-based decoder-only language model trained f
 - **Hidden Dimension**: 4,096
 - **Number of Layers**: 32
 - **Attention Heads**: 32
-- **Key-Value Heads**: 8 (Grouped Query Attention)
 - **MLP Ratio**: 8/3
 - **Position Encoding**: RoPE (base=10,000)
@@ -183,4 +192,14 @@ Please refer to the license terms of the original SlimPajama dataset and follow
 ## Contact
-For questions or issues, please contact the authors or open an issue in the repository.

 ---
+base_model:
+- internlm/internlm2-7b
 datasets:
 - cerebras/SlimPajama-627B
 language:
 - en
+license: mit
+metrics:
+- accuracy
+pipeline_tag: text-generation
+library_name: transformers
 ---
 # Random Baseline Language Model (7.2B Parameters, 150B Tokens)
+This repository contains the 7.2B parameter random baseline language model used in the paper [Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models](https://huggingface.co/papers/2504.14194).
+Code: https://github.com/opendatalab/Meta-rater
 ## Model Description
 This is a 7.2B parameter transformer-based decoder-only language model trained from scratch on 150B tokens randomly sampled from SlimPajama dataset. It represents the largest baseline model in the Meta-rater research, demonstrating performance capabilities at scale with random data selection.
 - **Hidden Dimension**: 4,096
 - **Number of Layers**: 32
 - **Attention Heads**: 32
+- **Key-Value Heads**: 8 (Grouped Query Attention)\
 - **MLP Ratio**: 8/3
 - **Position Encoding**: RoPE (base=10,000)
 ## Contact
+For questions or issues, please contact the authors or open an issue in the repository.
+---
+<div align="center">
+**⭐ Star us on GitHub if you find Meta-rater useful! ⭐**
+Made with ❤️ by the OpenDataLab team
+</div>