LongAlpaca-7b-32k-chinese: An Open-Source Chinese Long-Text Dialogue Model - Supporting Long-Text Q&A and Summarization

Longalpaca 7b 32k Chinese

Developed by yuyijiong

Chinese long-text dialogue model based on Llama2, supporting 32k context length, suitable for long-text QA, summarization, and other tasks

Large Language Model

Transformers

Supports Multiple Languages#32k long text processing #Multi-document QA #Paper summarization

Downloads 32

Release Time : 10/25/2023

Model Overview

Fine-tuned using LongLora training technology with position interpolation on the llama2-chat model, featuring excellent long-text processing capabilities, supporting multi-document retrieval and paper summarization at the 10k-word level

Model Features

Ultra-long context processing

Extends the context window to 32k tokens through position interpolation technology, capable of processing 10k-word long texts

Chinese optimization

Fine-tuned with Chinese long instruction datasets, specifically optimized for Chinese long-text processing

Multi-document QA

Supports processing multiple reference documents simultaneously and generating comprehensive answers

Streaming generation support

Compatible with streaming-LLM, capable of generating ultra-long text content

Model Capabilities

Long-text QA

Multi-document information integration

Academic paper summarization

Chinese dialogue generation

Long-text instruction understanding

Use Cases

Academic research

Paper summarization

Summarizes key points of long academic papers

Achieved a rouge-L score of 0.15166 on the vcsum dataset

Information retrieval

Multi-document QA

Extracts information from multiple related documents to answer complex questions

Achieved a rouge-L score of 0.18369 on the dureader dataset

🚀 Chinese Long Text Llama Model

This model is designed for long - text Chinese language processing, offering excellent long - text dialogue capabilities and being able to handle tasks such as multi - document retrieval and paper summarization.

🚀 Quick Start

The V2 version of this model is now available at v2 version, which shows significant improvements compared to V1, with higher - quality responses.

✨ Features

Training Method: Similar to LongAlpaca, this model uses the LongLora training technique. It first performs position interpolation on llama2 - chat and then conducts instruction fine - tuning with a small amount of long - text data, enabling it to have excellent long - text dialogue capabilities.
Dataset: The Chinese dataset used is similar to that of LongAlpaca but includes more data for multi - document Q&A.
Model Expansion: Derived from Atom - 7b - chat, this model extends the text length from 4k to 32k through linear position interpolation and then undergoes LoRA fine - tuning. It can now handle tasks such as multi - document retrieval and paper summarization of tens of thousands of words, while maintaining almost the same short - dialogue ability.
Streaming Support: Similar to the English version of LongAlpaca, this model supports streaming - LLM, allowing for the generation of longer texts. Example code

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

model_path="yuyijiong/LongAlpaca-7b-32k-chinese"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True).eval()


question="中国的首都是什么？"
input_text = "<s>Human: " + question + "\n</s><s>Assistant: "
input_ids = tokenizer(input_text, return_tensors='pt').input_ids.to(model.device)

with torch.no_grad():

    with torch.autocast('cuda'):
        output = model.generate(input_ids=input_ids,
                                max_new_tokens=max_new_tokens,
                                do_sample=True,
                                temperature=0.85,
                                top_k=None,
                                top_p=0.9,
                                use_cache=True,
                                **kwargs)

reply = tokenizer.decode(output[0], skip_special_tokens=False)
reply_return=reply.split('Assistant:')[-1].replace('</s>', '')

print('模型回答：', reply_return)

Advanced Usage

The original document does not provide advanced usage examples, so this part is skipped.

📚 Documentation

Training Details

This model uses the LongLora training technique. It first performs position interpolation on llama2 - chat and then conducts instruction fine - tuning with a small amount of long - text data. The Chinese dataset used is similar to that of LongAlpaca but includes more data for multi - document Q&A.

Evaluation and Inference

Streaming Support: This model supports streaming - LLM, allowing for the generation of longer texts. Example code
Instruction Sensitivity: Experiments show that this model is more sensitive to instructions at the end of long texts. Therefore, it is recommended to place questions after the reference documents.
LongBench Evaluation: The evaluation results of Chinese tasks on LongBench are as follows. The model performs well in free - response tasks such as Q&A and summarization but poorly in fixed - option tasks such as classification and multiple - choice, possibly due to the limited diversity of the instruction fine - tuning dataset.

Dataset	Task Type	Evaluation Metric	Score
dureader	Multi - document QA	rouge - L	0.18369
multifield_qa	Single - document QA	rouge - L	0.40816
vcsum	Summarization	rouge - L	0.15166
lsht	Text Classification	Accuracy	0.19680
passage_retrieval	Text Retrieval	Accuracy	0.06000

Model Limitations

Due to the limited types of tasks in the instruction fine - tuning data, the quality of the output results cannot be guaranteed when dealing with complex tasks. Currently, the Chinese instruction fine - tuning dataset lacks diversity, and the fine - tuned model has some overfitting problems, which will be improved in future versions.

User Feedback

If you encounter any problems during use, please feel free to start a discussion in the background to help us improve the model.

🔧 Technical Details

The model uses linear position interpolation to extend the text length from 4k to 32k and then undergoes LoRA fine - tuning. This process enables the model to handle long - text tasks while maintaining short - dialogue capabilities.

📄 License

This model is licensed under the CC - BY - NC - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご