Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md
README.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: ar
|
| 3 |
+
thumbnail: https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png
|
| 4 |
+
datasets:
|
| 5 |
+
- nadi
|
| 6 |
+
---
|
| 7 |
+
# Multi-dialect-Arabic-BERT
|
| 8 |
+
This is a repository of Multi-dialect Arabic BERT model.
|
| 9 |
+
|
| 10 |
+
By [Mawdoo3-AI](https://ai.mawdoo3.com/).
|
| 11 |
+
|
| 12 |
+
<p align="center">
|
| 13 |
+
<br>
|
| 14 |
+
<img src="https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png" alt="Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf" width="500"/>
|
| 15 |
+
<br>
|
| 16 |
+
<p>
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
### About our Multi-dialect-Arabic-BERT model
|
| 21 |
+
Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT) and trained it on 10M arabic tweets from the unlabled data of [The Nuanced Arabic Dialect Identification (NADI) shared task](https://sites.google.com/view/nadi-shared-task).
|
| 22 |
+
|
| 23 |
+
### To cite this work
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
@misc{talafha2020multidialect,
|
| 27 |
+
title={Multi-Dialect Arabic BERT for Country-Level Dialect Identification},
|
| 28 |
+
author={Bashar Talafha and Mohammad Ali and Muhy Eddin Za'ter and Haitham Seelawi and Ibraheem Tuffaha and Mostafa Samir and Wael Farhan and Hussein T. Al-Natsheh},
|
| 29 |
+
year={2020},
|
| 30 |
+
eprint={2007.05612},
|
| 31 |
+
archivePrefix={arXiv},
|
| 32 |
+
primaryClass={cs.CL}
|
| 33 |
+
}
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
### Usage
|
| 37 |
+
The model weights can be loaded using `transformers` library by HuggingFace.
|
| 38 |
+
|
| 39 |
+
```python
|
| 40 |
+
from transformers import AutoTokenizer, AutoModel
|
| 41 |
+
|
| 42 |
+
tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
|
| 43 |
+
model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Example using `pipeline`:
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from transformers import pipeline
|
| 50 |
+
|
| 51 |
+
fill_mask = pipeline(
|
| 52 |
+
"fill-mask",
|
| 53 |
+
model="bashar-talafha/multi-dialect-bert-base-arabic ",
|
| 54 |
+
tokenizer="bashar-talafha/multi-dialect-bert-base-arabic "
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
fill_mask(" سافر الرحالة من مطار [MASK] ")
|
| 58 |
+
```
|
| 59 |
+
```
|
| 60 |
+
[{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226},
|
| 61 |
+
{'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747},
|
| 62 |
+
{'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205},
|
| 63 |
+
{'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003},
|
| 64 |
+
{'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}]
|
| 65 |
+
```
|
| 66 |
+
### Repository
|
| 67 |
+
Please check the [original repository](https://github.com/mawdoo3/Multi-dialect-Arabic-BERT) for more information.
|
| 68 |
+
|
| 69 |
+
|