Mistral 7B Instruct Aya 101
基於Mistral-7B-Instruct-v0.2微調的多語言指令跟隨模型,支持101種語言
下載量 92
發布時間 : 2/27/2024
模型概述
這是一個基於Mistral-7B-Instruct-v0.2微調的指令跟隨模型,專門針對多語言場景優化,能夠理解和生成多種語言的文本內容。
模型特點
多語言支持
支持101種語言的文本理解和生成能力
指令跟隨
能夠準確理解和執行復雜的自然語言指令
安全響應
內置安全機制,避免生成有害或不道德內容
模型能力
多語言文本生成
指令理解與執行
技術概念解釋
故事創作
多語言翻譯
使用案例
教育
多語言學習助手
幫助學生理解不同語言的技術概念
提供準確的多語言解釋
內容創作
多語言故事創作
根據用戶指令生成多語言故事內容
示例中展示了悟空與卡比的短篇故事創作
技術解釋
技術概念說明
用多種語言解釋Docker等複雜技術概念
提供詳細的優缺點分析
🚀 Mistral-7B-Instruct-KhanAcademy-v0.2
本模型是基於 mistralai/Mistral-7B-Instruct-v0.2 在特定數據集上微調得到的版本,可用於多種語言的文本處理任務。
🚀 快速開始
使用示例
# 使用pipeline作為高級輔助工具
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
# 直接加載模型
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Mistral-7B-Instruct-Aya-101")
💻 使用示例
基礎用法
<s> [INST] You are a sentient, superintelligent artificial general intelligence, here to teach and assist me. Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.[/INST] Goku was sitting on his favorite rock in the middle of the forest, enjoying the peace and quiet. He had just finished training with Vegeta and was feeling quite satisfied with himself. Suddenly, he sensed a powerful presence approaching.
"Who goes there?" Goku called out, standing up and preparing for battle.
A figure appeared before him, and Goku's eyes widened in shock. It was Kirby! But why was he here? And why did he look so different?
"Goku-san!" Kirby cried, rushing forward and hugging the Saiyan prince. "I have come to ask for your help!"
Goku was confused but happy to see his old friend. "What's going on, Kirby? What do you need my help with?"
Kirby looked around nervously before speaking. "Majin Buu has joined forces with me, and we plan to destroy the world!"
Goku was stunned. "What?! Why would you do that, Kirby?!"
"It's not what it seems," Kirby explained. "Buu promised to make the world a better place if I helped him. We will only destroy the evil beings and leave the good ones alone."
Goku was torn. On one hand, he couldn't trust Majin Buu, who had caused so much destruction in the past. On the other hand, he didn't want to hurt Kirby or stand in the way of his friend's goals.
"I can't let you do this, Kirby," Goku said finally. "We can't trust Majin Buu. He'll only cause more harm in the end."
Kirby sighed. "I understand, Goku-san. But I must follow my own path now. Farewell, my friend."
With a heavy heart, Goku watched as Kirby and Majin Buu disappeared into the distance, ready to wreak havoc on the world. Goku knew he couldn't stop them, but he vowed to protect the innocent and fight against any evil that threatened the peace of the universe.</s>
高級用法
<s> [INST] You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
describe about pros and cons of docker system.[/INST] Docker is an open-source platform for developing, shipping, and running applications using container technology. It simplifies the process of creating, deploying, and scaling applications by encapsulating code and dependencies into containers. Here are some advantages and disadvantages of using Docker:
Advantages:
1. Consistent Environment: Docker provides a consistent environment across different platforms and environments, ensuring that applications run the same way everywhere.
2. Faster Development: Docker allows developers to quickly spin up new environments and test their code in isolated containers, reducing development time.
3. Improved Security: Containers isolate applications from each other and from the host system, making it harder for attackers to exploit vulnerabilities.
4. Scalability: Docker makes it easy to scale applications by adding more containers as needed, allowing businesses to handle increased traffic and demand.
5. Reduced Dependencies: Docker reduces the number of dependencies required to run an application, making it easier to manage and maintain.
6. Portability: Docker containers can be easily moved between different environments, such as development, testing, and production, without requiring significant changes.
7. Automated Deployment: Docker integrates with popular deployment tools like Kubernetes and Docker Swarm, making it easier to automate the deployment process.
8. Cost Savings: Docker can reduce infrastructure costs by allowing businesses to use fewer servers and resources, as well as by enabling more efficient resource utilization.
Disadvantages:
1. Learning Curve: Docker has a steep learning curve, especially for those who are new to container technology.
2. Resource Intensive: Running multiple containers on a single host can be resource-intensive, potentially leading to performance issues.
3. Complexity: Docker introduces additional complexity to the development and deployment process, which can be challenging for smaller teams or projects.
4. Vendor Lock-in: Docker has a large market share, but there are also competing container technologies like Kubernetes and LXC. Choosing one over the others can lead to vendor lock-in.
5. Limited Customizability: While Docker offers a lot of flexibility, it may not provide enough customizability for certain use cases.
6. Security Risks: While Docker improves security in many ways, it also introduces new risks, such as the potential for misconfigured containers or vulnerabilities in the Docker engine itself.
7. Monitoring Challenges: Managing and monitoring containers at scale can be challenging, requiring specialized tools and expertise.</s>
🔧 技術細節
訓練超參數
在訓練過程中使用了以下超參數:
- 學習率(learning_rate):5e-06
- 訓練批次大小(train_batch_size):2
- 評估批次大小(eval_batch_size):2
- 隨機種子(seed):42
- 分佈式類型(distributed_type):多GPU
- 設備數量(num_devices):4
- 梯度累積步數(gradient_accumulation_steps):4
- 總訓練批次大小(total_train_batch_size):32
- 總評估批次大小(total_eval_batch_size):8
- 優化器(optimizer):Adam,β值為(0.9, 0.999),ε值為1e-08
- 學習率調度器類型(lr_scheduler_type):餘弦
- 學習率調度器熱身步數(lr_scheduler_warmup_steps):10
- 訓練輪數(num_epochs):1
訓練結果
訓練損失 | 輪數 | 步數 | 驗證損失 |
---|---|---|---|
1.9039 | 0.0 | 1 | 3.1495 |
0.9933 | 0.25 | 101 | 1.2402 |
0.9439 | 0.5 | 202 | 1.1683 |
0.9762 | 0.75 | 303 | 1.1502 |
框架版本
- Transformers:4.39.0.dev0
- Pytorch:2.2.0+cu121
- Datasets:2.17.0
- Tokenizers:0.15.0
Axolotl配置
查看axolotl配置
axolotl版本:0.4.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
hub_model_id: MaziyarPanahi/Mistral-7B-Instruct-KhanAcademy-v0.2
hf_use_auth_token: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: CohereForAI/aya_dataset
type:
system_prompt: ""
field_instruction: inputs
field_output: targets
format: "[INST] {instruction} [/INST]"
no_input_format: "[INST] {instruction} [/INST]"
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./models/MaziyarPanahi/Mistral-7B-Instruct-Aya-101
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000005
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
📄 許可證
本模型採用Apache-2.0許可證。
屬性 | 詳情 |
---|---|
基礎模型 | mistralai/Mistral-7B-Instruct-v0.2 |
訓練數據集 | CohereForAI/aya_dataset |
支持語言 | 阿非利卡語、阿姆哈拉語、阿拉伯語、阿塞拜疆語、白俄羅斯語、孟加拉語、保加利亞語、加泰羅尼亞語、宿務語、捷克語、威爾士語、丹麥語、德語、希臘語、英語、世界語、愛沙尼亞語、巴斯克語、芬蘭語、菲律賓語、法語、弗裡西語、蘇格蘭蓋爾語、愛爾蘭語、加利西亞語、古吉拉特語、海地克里奧爾語、豪薩語、希伯來語、印地語、匈牙利語、亞美尼亞語、伊博語、印尼語、冰島語、意大利語、爪哇語、日語、卡納達語、格魯吉亞語、哈薩克語、高棉語、吉爾吉斯語、韓語、庫爾德語、老撾語、拉脫維亞語、拉丁語、立陶宛語、盧森堡語、馬拉雅拉姆語、馬拉地語、馬其頓語、馬爾加什語、馬耳他語、蒙古語、毛利語、馬來語、緬甸語、尼泊爾語、荷蘭語、挪威語、北索托語、尼揚賈語、奧里亞語、旁遮普語、波斯語、波蘭語、葡萄牙語、普什圖語、羅馬尼亞語、俄語、僧伽羅語、斯洛伐克語、斯洛文尼亞語、薩摩亞語、紹納語、信德語、索馬里語、南索托語、西班牙語、阿爾巴尼亞語、塞爾維亞語、巽他語、斯瓦希里語、瑞典語、泰米爾語、泰盧固語、塔吉克語、泰語、土耳其語、契維語、烏克蘭語、烏爾都語、烏茲別克語、越南語、科薩語、意第緒語、約魯巴語、中文、祖魯語 |
模型名稱 | Mistral-7B-Instruct-KhanAcademy-v0.2 |
許可證 | Apache-2.0 |
Phi 2 GGUF
其他
Phi-2是微軟開發的一個小型但強大的語言模型,具有27億參數,專注於高效推理和高質量文本生成。
大型語言模型 支持多種語言
P
TheBloke
41.5M
205
Roberta Large
MIT
基於掩碼語言建模目標預訓練的大型英語語言模型,採用改進的BERT訓練方法
大型語言模型 英語
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT是BERT基礎模型的蒸餾版本,在保持相近性能的同時更輕量高效,適用於序列分類、標記分類等自然語言處理任務。
大型語言模型 英語
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct 是一個多語言大語言模型,針對多語言對話用例進行了優化,在常見的行業基準測試中表現優異。
大型語言模型 英語
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa是基於100種語言的2.5TB過濾CommonCrawl數據預訓練的多語言模型,採用掩碼語言建模目標進行訓練。
大型語言模型 支持多種語言
X
FacebookAI
9.6M
664
Roberta Base
MIT
基於Transformer架構的英語預訓練模型,通過掩碼語言建模目標在海量文本上訓練,支持文本特徵提取和下游任務微調
大型語言模型 英語
R
FacebookAI
9.3M
488
Opt 125m
其他
OPT是由Meta AI發佈的開放預訓練Transformer語言模型套件,參數量從1.25億到1750億,旨在對標GPT-3系列性能,同時促進大規模語言模型的開放研究。
大型語言模型 英語
O
facebook
6.3M
198
1
基於transformers庫的預訓練模型,適用於多種NLP任務
大型語言模型
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1是Meta推出的多語言大語言模型系列,包含8B、70B和405B參數規模,支持8種語言和代碼生成,優化了多語言對話場景。
大型語言模型
Transformers 支持多種語言

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
T5基礎版是由Google開發的文本到文本轉換Transformer模型,參數規模2.2億,支持多語言NLP任務。
大型語言模型 支持多種語言
T
google-t5
5.4M
702
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98