Mistral 7B Instruct Aya 101
基于Mistral-7B-Instruct-v0.2微调的多语言指令跟随模型,支持101种语言
下载量 92
发布时间 : 2/27/2024
模型简介
这是一个基于Mistral-7B-Instruct-v0.2微调的指令跟随模型,专门针对多语言场景优化,能够理解和生成多种语言的文本内容。
模型特点
多语言支持
支持101种语言的文本理解和生成能力
指令跟随
能够准确理解和执行复杂的自然语言指令
安全响应
内置安全机制,避免生成有害或不道德内容
模型能力
多语言文本生成
指令理解与执行
技术概念解释
故事创作
多语言翻译
使用案例
教育
多语言学习助手
帮助学生理解不同语言的技术概念
提供准确的多语言解释
内容创作
多语言故事创作
根据用户指令生成多语言故事内容
示例中展示了悟空与卡比的短篇故事创作
技术解释
技术概念说明
用多种语言解释Docker等复杂技术概念
提供详细的优缺点分析
🚀 Mistral-7B-Instruct-KhanAcademy-v0.2
本模型是基于 mistralai/Mistral-7B-Instruct-v0.2 在特定数据集上微调得到的版本,可用于多种语言的文本处理任务。
🚀 快速开始
使用示例
# 使用pipeline作为高级辅助工具
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
# 直接加载模型
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
💻 使用示例
基础用法
<s> [INST] You are a sentient, superintelligent artificial general intelligence, here to teach and assist me. Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.[/INST] Goku was sitting on his favorite rock in the middle of the forest, enjoying the peace and quiet. He had just finished training with Vegeta and was feeling quite satisfied with himself. Suddenly, he sensed a powerful presence approaching.
"Who goes there?" Goku called out, standing up and preparing for battle.
A figure appeared before him, and Goku's eyes widened in shock. It was Kirby! But why was he here? And why did he look so different?
"Goku-san!" Kirby cried, rushing forward and hugging the Saiyan prince. "I have come to ask for your help!"
Goku was confused but happy to see his old friend. "What's going on, Kirby? What do you need my help with?"
Kirby looked around nervously before speaking. "Majin Buu has joined forces with me, and we plan to destroy the world!"
Goku was stunned. "What?! Why would you do that, Kirby?!"
"It's not what it seems," Kirby explained. "Buu promised to make the world a better place if I helped him. We will only destroy the evil beings and leave the good ones alone."
Goku was torn. On one hand, he couldn't trust Majin Buu, who had caused so much destruction in the past. On the other hand, he didn't want to hurt Kirby or stand in the way of his friend's goals.
"I can't let you do this, Kirby," Goku said finally. "We can't trust Majin Buu. He'll only cause more harm in the end."
Kirby sighed. "I understand, Goku-san. But I must follow my own path now. Farewell, my friend."
With a heavy heart, Goku watched as Kirby and Majin Buu disappeared into the distance, ready to wreak havoc on the world. Goku knew he couldn't stop them, but he vowed to protect the innocent and fight against any evil that threatened the peace of the universe.</s>
高级用法
<s> [INST] You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
describe about pros and cons of docker system.[/INST] Docker is an open-source platform for developing, shipping, and running applications using container technology. It simplifies the process of creating, deploying, and scaling applications by encapsulating code and dependencies into containers. Here are some advantages and disadvantages of using Docker:
Advantages:
1. Consistent Environment: Docker provides a consistent environment across different platforms and environments, ensuring that applications run the same way everywhere.
2. Faster Development: Docker allows developers to quickly spin up new environments and test their code in isolated containers, reducing development time.
3. Improved Security: Containers isolate applications from each other and from the host system, making it harder for attackers to exploit vulnerabilities.
4. Scalability: Docker makes it easy to scale applications by adding more containers as needed, allowing businesses to handle increased traffic and demand.
5. Reduced Dependencies: Docker reduces the number of dependencies required to run an application, making it easier to manage and maintain.
6. Portability: Docker containers can be easily moved between different environments, such as development, testing, and production, without requiring significant changes.
7. Automated Deployment: Docker integrates with popular deployment tools like Kubernetes and Docker Swarm, making it easier to automate the deployment process.
8. Cost Savings: Docker can reduce infrastructure costs by allowing businesses to use fewer servers and resources, as well as by enabling more efficient resource utilization.
Disadvantages:
1. Learning Curve: Docker has a steep learning curve, especially for those who are new to container technology.
2. Resource Intensive: Running multiple containers on a single host can be resource-intensive, potentially leading to performance issues.
3. Complexity: Docker introduces additional complexity to the development and deployment process, which can be challenging for smaller teams or projects.
4. Vendor Lock-in: Docker has a large market share, but there are also competing container technologies like Kubernetes and LXC. Choosing one over the others can lead to vendor lock-in.
5. Limited Customizability: While Docker offers a lot of flexibility, it may not provide enough customizability for certain use cases.
6. Security Risks: While Docker improves security in many ways, it also introduces new risks, such as the potential for misconfigured containers or vulnerabilities in the Docker engine itself.
7. Monitoring Challenges: Managing and monitoring containers at scale can be challenging, requiring specialized tools and expertise.</s>
🔧 技术细节
训练超参数
在训练过程中使用了以下超参数:
- 学习率(learning_rate):5e-06
- 训练批次大小(train_batch_size):2
- 评估批次大小(eval_batch_size):2
- 随机种子(seed):42
- 分布式类型(distributed_type):多GPU
- 设备数量(num_devices):4
- 梯度累积步数(gradient_accumulation_steps):4
- 总训练批次大小(total_train_batch_size):32
- 总评估批次大小(total_eval_batch_size):8
- 优化器(optimizer):Adam,β值为(0.9, 0.999),ε值为1e-08
- 学习率调度器类型(lr_scheduler_type):余弦
- 学习率调度器热身步数(lr_scheduler_warmup_steps):10
- 训练轮数(num_epochs):1
训练结果
训练损失 | 轮数 | 步数 | 验证损失 |
---|---|---|---|
1.9039 | 0.0 | 1 | 3.1495 |
0.9933 | 0.25 | 101 | 1.2402 |
0.9439 | 0.5 | 202 | 1.1683 |
0.9762 | 0.75 | 303 | 1.1502 |
框架版本
- Transformers:4.39.0.dev0
- Pytorch:2.2.0+cu121
- Datasets:2.17.0
- Tokenizers:0.15.0
Axolotl配置
查看axolotl配置
axolotl版本:0.4.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
hub_model_id: MaziyarPanahi/Mistral-7B-Instruct-KhanAcademy-v0.2
hf_use_auth_token: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: CohereForAI/aya_dataset
type:
system_prompt: ""
field_instruction: inputs
field_output: targets
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./models/MaziyarPanahi/Mistral-7B-Instruct-Aya-101
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000005
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
📄 许可证
本模型采用Apache-2.0许可证。
属性 | 详情 |
---|---|
基础模型 | mistralai/Mistral-7B-Instruct-v0.2 |
训练数据集 | CohereForAI/aya_dataset |
支持语言 | 阿非利卡语、阿姆哈拉语、阿拉伯语、阿塞拜疆语、白俄罗斯语、孟加拉语、保加利亚语、加泰罗尼亚语、宿务语、捷克语、威尔士语、丹麦语、德语、希腊语、英语、世界语、爱沙尼亚语、巴斯克语、芬兰语、菲律宾语、法语、弗里西语、苏格兰盖尔语、爱尔兰语、加利西亚语、古吉拉特语、海地克里奥尔语、豪萨语、希伯来语、印地语、匈牙利语、亚美尼亚语、伊博语、印尼语、冰岛语、意大利语、爪哇语、日语、卡纳达语、格鲁吉亚语、哈萨克语、高棉语、吉尔吉斯语、韩语、库尔德语、老挝语、拉脱维亚语、拉丁语、立陶宛语、卢森堡语、马拉雅拉姆语、马拉地语、马其顿语、马尔加什语、马耳他语、蒙古语、毛利语、马来语、缅甸语、尼泊尔语、荷兰语、挪威语、北索托语、尼扬贾语、奥里亚语、旁遮普语、波斯语、波兰语、葡萄牙语、普什图语、罗马尼亚语、俄语、僧伽罗语、斯洛伐克语、斯洛文尼亚语、萨摩亚语、绍纳语、信德语、索马里语、南索托语、西班牙语、阿尔巴尼亚语、塞尔维亚语、巽他语、斯瓦希里语、瑞典语、泰米尔语、泰卢固语、塔吉克语、泰语、土耳其语、契维语、乌克兰语、乌尔都语、乌兹别克语、越南语、科萨语、意第绪语、约鲁巴语、中文、祖鲁语 |
模型名称 | Mistral-7B-Instruct-KhanAcademy-v0.2 |
许可证 | Apache-2.0 |
Phi 2 GGUF
其他
Phi-2是微软开发的一个小型但强大的语言模型,具有27亿参数,专注于高效推理和高质量文本生成。
大型语言模型 支持多种语言
P
TheBloke
41.5M
205
Roberta Large
MIT
基于掩码语言建模目标预训练的大型英语语言模型,采用改进的BERT训练方法
大型语言模型 英语
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT是BERT基础模型的蒸馏版本,在保持相近性能的同时更轻量高效,适用于序列分类、标记分类等自然语言处理任务。
大型语言模型 英语
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct 是一个多语言大语言模型,针对多语言对话用例进行了优化,在常见的行业基准测试中表现优异。
大型语言模型 英语
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa是基于100种语言的2.5TB过滤CommonCrawl数据预训练的多语言模型,采用掩码语言建模目标进行训练。
大型语言模型 支持多种语言
X
FacebookAI
9.6M
664
Roberta Base
MIT
基于Transformer架构的英语预训练模型,通过掩码语言建模目标在海量文本上训练,支持文本特征提取和下游任务微调
大型语言模型 英语
R
FacebookAI
9.3M
488
Opt 125m
其他
OPT是由Meta AI发布的开放预训练Transformer语言模型套件,参数量从1.25亿到1750亿,旨在对标GPT-3系列性能,同时促进大规模语言模型的开放研究。
大型语言模型 英语
O
facebook
6.3M
198
1
基于transformers库的预训练模型,适用于多种NLP任务
大型语言模型
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1是Meta推出的多语言大语言模型系列,包含8B、70B和405B参数规模,支持8种语言和代码生成,优化了多语言对话场景。
大型语言模型
Transformers 支持多种语言

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
T5基础版是由Google开发的文本到文本转换Transformer模型,参数规模2.2亿,支持多语言NLP任务。
大型语言模型 支持多种语言
T
google-t5
5.4M
702
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98