| | --- |
| | license: mit |
| | library_name: adapter-transformers |
| | --- |
| | Effi-13B AWQ is a quantization model of our [Effi-13B](https://huggingface.co/aiplanet/effi-13b) a reasoning model. |
| |
|
| | ## About AWQ |
| |
|
| | AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference. |
| |
|
| | It is also now supported by continuous batching server vLLM, allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios. |
| |
|
| | effi-13B parameters is a causal decoder-only model built by AI Planet based on Llama-2-13b-chat-hf and fine tuned using the 1.8 Million coversations from CoT dataset available in huggingface datasets. The model is made available under the Apache 2.0 license. |
| |
|
| | ## Why use effi-13B-Instruct? |
| |
|
| | - This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided. |
| | - Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from Llama-2-13b-chat-hf |
| | You will need at least 85-100GB of memory to run inference with effi-13b swiftly. |
| |
|
| | ## Our benchmarking |
| |
|
| | | Metric | Value | |
| | |--------------------|---------| |
| | | Perplexity | 5.529 | |
| | | MMLU | 50.90 | |
| | | Hella Swag (acc) | 59.38 | |
| | | Hella Swag (acc_norm) | 78.91 | |
| | | TruthfulQA | 38.24 | |
| | |
| | ## Direct Use |
| | |
| | effi-13b has been finetuned on a Chain of Thought dataset. |
| | |
| | ## Out-of-Scope Use |
| | |
| | Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. |
| | |
| | ## Bias, Risks, and Limitations |
| | |
| | This model has been majorly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. |
| | |
| | ## Recommendations |
| | |
| | We recommend users of effi-13b to develop guardrails and take appropriate precautions for any production use. |
| | |
| | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations. |
| | |
| | ## Citations |
| | |
| | ``` |
| | @misc {lucifertrj, |
| | author = { {Tarun Jain} }, |
| | title = { Effi-13B-AWQ by AI Planet}, |
| | year = 2024, |
| | url = { https://huggingface.co/aiplanet/effi-13B-AWQ/ }, |
| | publisher = { Hugging Face } |
| | } |
| | ``` |