Jamba Hercules
Jamba-Hercules是基于ai21labs/Jamba-v0.1微调的大语言模型,使用Locutusque/hercules-v4.0数据集进行训练,专注于文本生成任务。
下载量 24
发布时间 : 3/31/2024
模型简介
该模型是一个经过微调的大语言模型,主要用于生成高质量的文本内容,能够理解和生成复杂的自然语言响应。
模型特点
高效微调
使用Locutusque/hercules-v4.0数据集的前1万条示例进行微调,优化了模型性能。
低资源推理
支持4位量化推理,降低硬件需求,可在消费级GPU上运行。
对话优化
特别优化了对话生成能力,能够生成连贯、有逻辑的对话响应。
模型能力
文本生成
对话系统
创意写作
使用案例
对话系统
AI助手
可作为智能对话助手使用,回答用户问题并提供帮助。
生成自然、连贯的对话响应
创意写作
科幻故事生成
生成富有想象力的科幻故事和场景描述。
如示例中展示的'太空树'创意故事
🚀 Jamba-Hercules
Jamba-Hercules是一个文本生成模型,基于特定数据集训练,能处理文本生成任务,可用于生成富有想象力的文本内容,如示例中树成为太空生物的过程描述。
🚀 快速开始
模型信息
属性 | 详情 |
---|---|
模型类型 | 文本生成模型 |
基础模型 | ai21labs/Jamba-v0.1 |
训练数据集 | Locutusque/hercules-v4.0 |
许可证 | Apache-2.0 |
数据集使用说明
使用的数据集为 Locutusque/hercules-v4.0,使用了前 10k 个示例。
💻 使用示例
基础用法
# 此代码会自动将 "<|startoftext|>" 特殊标记添加到任何输入中。在推理时,必须将此标记添加到所有输入中,因为初步测试表明,不添加该标记会导致输出错误。
!pip install -qqq transformers>=4.39.0 mamba-ssm causal-conv1d>=1.2.0 accelerate bitsandbytes --progress-bar off
!pip install flash-attn --no-build-isolation
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
double_quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16
)
model = AutoModelForCausalLM.from_pretrained(
"Severian/Jamba-Hercules",
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
quantization_config=double_quant_config,
)
tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Hercules")
input_text = """<|im_start|>system
You are a helpful AI assistant.
<|im_end|>
<|im_start|>user
A tree needs to leave the earth and become astro-biological. How can it achieve it's goal?
<|im_end|>
<|im_start|>assistant
"""
input_ids = tokenizer(input_text, return_tensors='pt').to(model.device)["input_ids"]
outputs = model.generate(input_ids, max_new_tokens=1024, temperature=0.0, repetition_penalty=1.1)
print(tokenizer.batch_decode(outputs)[0])
# <|startoftext|><|im_start|>system
# You are a helpful AI assistant.
# <|im_end|>
# <|im_start|>user
# A tree needs to leave the earth and become astro-biological. How can it achieve it's goal?
# <|im_end|>
# <|im_start|>assistant
# The process of becoming astronautic life for a tree is an excemtally complicated one that requires a great deal of preparation, coordination, and funding. The first step in this process is for the tree to be identified as a potential candidate for astronautics, which may involve a thorough assessment of its size, health, and suitability for space travel.
# Once a tree has been identified as a possible astronautic life-form, the next step involves preparing it for spaceflight. This includes training the tree to tolerate low gravity environments, teaching it how to navigate in zero gravity, and ensuring that its nutritional needs are met in space. The process of acclimating a tree to spaceflight can take months or even years, depending on the species and the specific requirements of each mission.
# Once the tree has been adequately trained and prepared for spaceflight, it must then be transported into orbit or beyond. This may involve launching the tree aboard a rocket or shuttle, or using other methods such as balloons or parachutes to reach its destination. The journey itself can be long and arduous, with many potential risks and challenges along the way.
# Upon arrival in space, the tree will need to be monitored closely and cared for by astronautic personnel. This may involve feeding the tree, maintaining its health and well-being, and ensuring that it has enough water and nutrients to survive. The tree may also need to be protected from radiation, extreme temperatures, or other hazards that it may face in space.
# Over time, as the tree adapts to life in space, it will likely undergo some changes in its physical appearance and behavior. For example, it may begin growing in new ways, developing different kinds of leaves or branches in response to the unique environment it finds itself in. It may also become more active and mobile, using its roots to navigate and explore its new home.
# Throughout this process, there are many challenges that need to be addressed and overcome.<|im_end|>
# <|endoftext|>
示例输出
<|startoftext|><|im_start|>system
You are a helpful AI assistant.
<|im_end|>
<|im_start|>user
A tree needs to leave the earth and become astro-biological. How can it achieve it's goal?
<|im_end|>
<|im_start|>assistant
The process of becoming astronautic life for a tree is an excemtally complicated one that requires a great deal of preparation, coordination, and funding. The first step in this process is for the tree to be identified as a potential candidate for astronautics, which may involve a thorough assessment of its size, health, and suitability for space travel.
Once a tree has been identified as a possible astronautic life-form, the next step involves preparing it for spaceflight. This includes training the tree to tolerate low gravity environments, teaching it how to navigate in zero gravity, and ensuring that its nutritional needs are met in space. The process of acclimating a tree to spaceflight can take months or even years, depending on the species and the specific requirements of each mission.
Once the tree has been adequately trained and prepared for spaceflight, it must then be transported into orbit or beyond. This may involve launching the tree aboard a rocket or shuttle, or using other methods such as balloons or parachutes to reach its destination. The journey itself can be long and arduous, with many potential risks and challenges along the way.
Upon arrival in space, the tree will need to be monitored closely and cared for by astronautic personnel. This may involve feeding the tree, maintaining its health and well-being, and ensuring that it has enough water and nutrients to survive. The tree may also need to be protected from radiation, extreme temperatures, or other hazards that it may face in space.
Over time, as the tree adapts to life in space, it will likely undergo some changes in its physical appearance and behavior. For example, it may begin growing in new ways, developing different kinds of leaves or branches in response to the unique environment it finds itself in. It may also become more active and mobile, using its roots to navigate and explore its new home.
Throughout this process, there are many challenges that need to be addressed and overcome.<|im_end|>
<|endoftext|>
🔧 技术细节
训练信息
Hercules-v4.0 首次测试
- 1250 步(使用 A100 训练 5 小时)
- 最终损失:0.98
超参数
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM",
bias="none"
)
trainer = SFTTrainer(
model=model,
train_dataset=train_dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=TrainingArguments(
num_train_epochs=1,
lr_scheduler_type='cosine',
learning_rate=0.0002,
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
gradient_checkpointing=True,
warmup_steps=10,
weight_decay=0.01,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
save_steps=200,
output_dir="outputs",
optim="adamw_bnb_8bit",
adam_epsilon=0.00001,
adam_beta2=0.95,
max_grad_norm=1.0,
seed=42,
),
)
📄 许可证
本项目采用 Apache-2.0 许可证。
Phi 2 GGUF
其他
Phi-2是微软开发的一个小型但强大的语言模型,具有27亿参数,专注于高效推理和高质量文本生成。
大型语言模型 支持多种语言
P
TheBloke
41.5M
205
Roberta Large
MIT
基于掩码语言建模目标预训练的大型英语语言模型,采用改进的BERT训练方法
大型语言模型 英语
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT是BERT基础模型的蒸馏版本,在保持相近性能的同时更轻量高效,适用于序列分类、标记分类等自然语言处理任务。
大型语言模型 英语
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct 是一个多语言大语言模型,针对多语言对话用例进行了优化,在常见的行业基准测试中表现优异。
大型语言模型 英语
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa是基于100种语言的2.5TB过滤CommonCrawl数据预训练的多语言模型,采用掩码语言建模目标进行训练。
大型语言模型 支持多种语言
X
FacebookAI
9.6M
664
Roberta Base
MIT
基于Transformer架构的英语预训练模型,通过掩码语言建模目标在海量文本上训练,支持文本特征提取和下游任务微调
大型语言模型 英语
R
FacebookAI
9.3M
488
Opt 125m
其他
OPT是由Meta AI发布的开放预训练Transformer语言模型套件,参数量从1.25亿到1750亿,旨在对标GPT-3系列性能,同时促进大规模语言模型的开放研究。
大型语言模型 英语
O
facebook
6.3M
198
1
基于transformers库的预训练模型,适用于多种NLP任务
大型语言模型
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1是Meta推出的多语言大语言模型系列,包含8B、70B和405B参数规模,支持8种语言和代码生成,优化了多语言对话场景。
大型语言模型
Transformers 支持多种语言

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
T5基础版是由Google开发的文本到文本转换Transformer模型,参数规模2.2亿,支持多语言NLP任务。
大型语言模型 支持多种语言
T
google-t5
5.4M
702
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98