🚀 Chinese Long Text Llama Model
This model is designed for long - text Chinese language processing, offering excellent long - text dialogue capabilities and being able to handle tasks such as multi - document retrieval and paper summarization.
🚀 Quick Start
The V2 version of this model is now available at v2 version, which shows significant improvements compared to V1, with higher - quality responses.
✨ Features
- Training Method: Similar to LongAlpaca, this model uses the LongLora training technique. It first performs position interpolation on llama2 - chat and then conducts instruction fine - tuning with a small amount of long - text data, enabling it to have excellent long - text dialogue capabilities.
- Dataset: The Chinese dataset used is similar to that of LongAlpaca but includes more data for multi - document Q&A.
- Model Expansion: Derived from Atom - 7b - chat, this model extends the text length from 4k to 32k through linear position interpolation and then undergoes LoRA fine - tuning. It can now handle tasks such as multi - document retrieval and paper summarization of tens of thousands of words, while maintaining almost the same short - dialogue ability.
- Streaming Support: Similar to the English version of LongAlpaca, this model supports streaming - LLM, allowing for the generation of longer texts. Example code
📦 Installation
No specific installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
model_path="yuyijiong/LongAlpaca-7b-32k-chinese"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True).eval()
question="中国的首都是什么?"
input_text = "<s>Human: " + question + "\n</s><s>Assistant: "
input_ids = tokenizer(input_text, return_tensors='pt').input_ids.to(model.device)
with torch.no_grad():
with torch.autocast('cuda'):
output = model.generate(input_ids=input_ids,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.85,
top_k=None,
top_p=0.9,
use_cache=True,
**kwargs)
reply = tokenizer.decode(output[0], skip_special_tokens=False)
reply_return=reply.split('Assistant:')[-1].replace('</s>', '')
print('模型回答:', reply_return)
Advanced Usage
The original document does not provide advanced usage examples, so this part is skipped.
📚 Documentation
Training Details
This model uses the LongLora training technique. It first performs position interpolation on llama2 - chat and then conducts instruction fine - tuning with a small amount of long - text data. The Chinese dataset used is similar to that of LongAlpaca but includes more data for multi - document Q&A.
Evaluation and Inference
- Streaming Support: This model supports streaming - LLM, allowing for the generation of longer texts. Example code
- Instruction Sensitivity: Experiments show that this model is more sensitive to instructions at the end of long texts. Therefore, it is recommended to place questions after the reference documents.
- LongBench Evaluation: The evaluation results of Chinese tasks on LongBench are as follows. The model performs well in free - response tasks such as Q&A and summarization but poorly in fixed - option tasks such as classification and multiple - choice, possibly due to the limited diversity of the instruction fine - tuning dataset.
Dataset |
Task Type |
Evaluation Metric |
Score |
dureader |
Multi - document QA |
rouge - L |
0.18369 |
multifield_qa |
Single - document QA |
rouge - L |
0.40816 |
vcsum |
Summarization |
rouge - L |
0.15166 |
lsht |
Text Classification |
Accuracy |
0.19680 |
passage_retrieval |
Text Retrieval |
Accuracy |
0.06000 |
Model Limitations
Due to the limited types of tasks in the instruction fine - tuning data, the quality of the output results cannot be guaranteed when dealing with complex tasks. Currently, the Chinese instruction fine - tuning dataset lacks diversity, and the fine - tuned model has some overfitting problems, which will be improved in future versions.
User Feedback
If you encounter any problems during use, please feel free to start a discussion in the background to help us improve the model.
🔧 Technical Details
The model uses linear position interpolation to extend the text length from 4k to 32k and then undergoes LoRA fine - tuning. This process enables the model to handle long - text tasks while maintaining short - dialogue capabilities.
📄 License
This model is licensed under the CC - BY - NC - 4.0 license.