đ open-llama-3b-v2-chat
The open-llama-3b-v2-chat
is a text - generation model based on LLaMA 3B v2, which can be used for various text - related tasks. It has been evaluated on multiple datasets and shows different performance metrics.
đ Quick Start
Prerequisites
In addition to pytorch
and transformers
, install required packages:
pip install sentencepiece
Usage
To use, copy the following script:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'mediocredev/open-llama-3b-v2-chat'
tokenizer_id = 'mediocredev/open-llama-3b-v2-chat'
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
chat_history = [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "I am here."},
{"role": "user", "content": "How many days are there in a leap year?"},
]
input_ids = tokenizer.apply_chat_template(
chat_history, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
output_tokens = model.generate(
input_ids,
repetition_penalty=1.05,
max_new_tokens=1000,
)
output_text = tokenizer.decode(
output_tokens[0][len(input_ids[0]) :], skip_special_tokens=True
)
print(output_text)
⨠Features
The model has been evaluated on multiple text - generation tasks, including AI2 Reasoning Challenge
, HellaSwag
, MMLU
, TruthfulQA
, Winogrande
, and GSM8k
. You can find detailed evaluation results on the Open LLM Leaderboard.
đ§ Technical Details
The mediocredev/open-llama-3b-v2-chat
model is based on LLaMA 3B v2. It has shown different performance on various datasets. Here are some of the evaluation metrics:
Metric |
Value |
Avg. |
40.93 |
AI2 Reasoning Challenge (25 - Shot) |
40.61 |
HellaSwag (10 - Shot) |
70.30 |
MMLU (5 - Shot) |
28.73 |
TruthfulQA (0 - shot) |
37.84 |
Winogrande (5 - shot) |
65.51 |
GSM8k (5 - shot) |
2.58 |
Detailed results can be found here
â ī¸ Limitations
The mediocredev/open-llama-3b-v2-chat
model is based on LLaMA 3B v2. It may face challenges in factual accuracy, especially when dealing with conflicting information or nuanced topics. Its outputs are non - deterministic, and critical evaluation is needed to avoid over - relying on its statements. Moreover, although its generative capabilities are promising, it may sometimes generate factually incorrect or offensive content, which requires careful curation and human supervision. As an evolving model, LLaMA is still under development, and efforts are being made to address its limitations in areas such as bias mitigation and interpretability. By using this model responsibly and being aware of its drawbacks, we can unleash its potential while minimizing risks.
đ§ Contact
Welcome any feedback, questions, and discussions. Feel free to reach out: mediocredev@outlook.com
đ License
This project is licensed under the Apache 2.0 license.