Firefly Qwen 7b
A Chinese dialogue model fine-tuned based on Qwen-7B, integrating MOSS dataset and school math problem data
Downloads 23
Release Time : 8/17/2023
Model Overview
An open-source large language model optimized for Chinese dialogue scenarios, supporting single-turn and multi-turn conversational interactions
Model Features
Chinese dialogue optimization
Specialized fine-tuning for Chinese context, with better dialogue fluency than the original Qwen-7B
Enhanced math capabilities
Incorporates 20,000 school math problems to improve mathematical reasoning
Multi-turn dialogue support
Supports context memory up to 1000 tokens for maintaining dialogue coherence
Model Capabilities
Open-domain dialogue
Math problem solving
Context understanding
Text generation
Use Cases
Educational applications
Math tutoring
Solving and explaining primary/secondary school math problems step-by-step
Accuracy improved by approximately 15% compared to the base model
Intelligent customer service
Multi-turn consultation
Handling complex user inquiry scenarios
Context retention accuracy exceeds 80%
🚀 Firefly Qwen-7B Fine-tuning Project
This project uses the Firefly project to fine-tune the Tongyi Qianwen Qwen-7B model. The training data consists of approximately one million rounds of dialogue data, including the moss data shared by the project and 20,000 pieces of school math data.
For more details, please refer to the project Firefly.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
"""
Single-round dialogue, without the memory function of dialogue history
"""
def main():
model_name = 'YeungNLP/firefly-qwen-7b'
max_new_tokens = 500
top_p = 0.9
temperature = 0.35
repetition_penalty = 1.0
device = 'cuda'
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto'
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
# llama does not support fast
use_fast=False if model.config.model_type == 'llama' else True
)
# QWenTokenizer is special, with pad_token_id, bos_token_id, and eos_token_id all being None. The token corresponding to eod_id is <|endoftext|>
if tokenizer.__class__.__name__ == 'QWenTokenizer':
tokenizer.pad_token_id = tokenizer.eod_id
tokenizer.bos_token_id = tokenizer.eod_id
tokenizer.eos_token_id = tokenizer.eod_id
text = input('User:')
while True:
text = text.strip()
# chatglm uses the official data organization format
if model.config.model_type == 'chatglm':
text = '[Round 1]\n\n问:{}\n\n答:'.format(text)
input_ids = tokenizer(text, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
# To be compatible with qwen-7b, because tokenizing its eos_token cannot get the corresponding eos_token_id
else:
input_ids = tokenizer(text, return_tensors="pt", add_special_tokens=False).input_ids.to(device)
bos_token_id = torch.tensor([[tokenizer.bos_token_id]], dtype=torch.long).to(device)
eos_token_id = torch.tensor([[tokenizer.eos_token_id]], dtype=torch.long).to(device)
input_ids = torch.concat([bos_token_id, input_ids, eos_token_id], dim=1)
with torch.no_grad():
outputs = model.generate(
input_ids=input_ids, max_new_tokens=max_new_tokens, do_sample=True,
top_p=top_p, temperature=temperature, repetition_penalty=repetition_penalty,
eos_token_id=tokenizer.eos_token_id
)
outputs = outputs.tolist()[0][len(input_ids[0]):]
response = tokenizer.decode(outputs)
response = response.strip().replace(tokenizer.eos_token, "").strip()
print("Firefly:{}".format(response))
text = input('User:')
if __name__ == '__main__':
main()
Advanced Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
def main():
model_name = 'YeungNLP/firefly-qwen-7b'
device = 'cuda'
max_new_tokens = 500 # Maximum number of tokens generated per round of dialogue
history_max_len = 1000 # Maximum token length remembered by the model
top_p = 0.9
temperature = 0.35
repetition_penalty = 1.0
# Load the model
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto'
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
# llama does not support fast
use_fast=False if model.config.model_type == 'llama' else True
)
# QWenTokenizer is special, with pad_token_id, bos_token_id, and eos_token_id all being None. The token corresponding to eod_id is <|endoftext|>
if tokenizer.__class__.__name__ == 'QWenTokenizer':
tokenizer.pad_token_id = tokenizer.eod_id
tokenizer.bos_token_id = tokenizer.eod_id
tokenizer.eos_token_id = tokenizer.eod_id
# Record all historical records
if model.config.model_type != 'chatglm':
history_token_ids = torch.tensor([[tokenizer.bos_token_id]], dtype=torch.long)
else:
history_token_ids = torch.tensor([[]], dtype=torch.long)
# Start the dialogue
utterance_id = 0 # Record the current round of dialogue, to fit the data organization format of chatglm
user_input = input('User:')
while True:
utterance_id += 1
# chatglm uses the official data organization format
if model.config.model_type == 'chatglm':
user_input = '[Round {}]\n\n问:{}\n\n答:'.format(utterance_id, user_input)
user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids
# Firefly's data organization format
# To be compatible with qwen-7b, because tokenizing its eos_token cannot get the corresponding eos_token_id
else:
input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids
eos_token_id = torch.tensor([[tokenizer.eos_token_id]], dtype=torch.long)
user_input_ids = torch.concat([input_ids, eos_token_id], dim=1)
history_token_ids = torch.concat((history_token_ids, user_input_ids), dim=1)
model_input_ids = history_token_ids[:, -history_max_len:].to(device)
with torch.no_grad():
outputs = model.generate(
input_ids=model_input_ids, max_new_tokens=max_new_tokens, do_sample=True, top_p=top_p,
temperature=temperature, repetition_penalty=repetition_penalty, eos_token_id=tokenizer.eos_token_id
)
model_input_ids_len = model_input_ids.size(1)
response_ids = outputs[:, model_input_ids_len:]
history_token_ids = torch.concat((history_token_ids, response_ids.cpu()), dim=1)
response = tokenizer.batch_decode(response_ids)
print("Firefly:" + response[0].strip().replace(tokenizer.eos_token, ""))
user_input = input('User:')
if __name__ == '__main__':
main()
Phi 2 GGUF
Other
Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.
Large Language Model Supports Multiple Languages
P
TheBloke
41.5M
205
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Featured Recommended AI Models