Qwen2.5-7B-Instruct Open-Source Large Language Model - Free Deployment to Empower Text Generation and Inference Tasks

Chinese Text Correction 7b

Developed by shibing624

Qwen2.5-7B-Instruct is a 7B-parameter-scale Chinese instruction fine-tuned large language model based on the Qwen2.5 architecture, suitable for text generation and reasoning tasks.

Large Language Model

Transformers

ChineseOpen Source License:Apache-2.0 #Chinese Text Correction #Instruction Fine-tuning #High-precision Semantic Understanding

Downloads 522

Release Time : 10/12/2024

Model Overview

This model is primarily used for Chinese text generation and reasoning tasks, supporting applications such as text correction.

Model Features

Chinese Instruction Fine-tuning

Optimized for Chinese instructions, enabling better understanding and execution of Chinese tasks.

Text Correction Capability

Supports Chinese text correction tasks, capable of identifying and fixing errors in text.

Large Language Model

Based on a 7B-parameter-scale large language model, equipped with powerful text generation and comprehension capabilities.

Model Capabilities

Text Generation

Text Correction

Instruction Understanding

Use Cases

Text Correction

Chinese Text Correction

Identifies and corrects grammatical, spelling, and word usage errors in Chinese text.

Effectively improves the accuracy and readability of text.

Text Generation

Chinese Text Generation

Generates coherent and fluent Chinese text based on given prompts.

The generated text follows contextual logic and exhibits high readability.

🚀 Chinese Text Correction Model

A Chinese text correction model for spelling and grammar error correction.

This model, shibing624/chinese-text-correction-7b, is designed to perform spelling and grammar corrections. It can effectively handle various types of errors in Chinese text, providing accurate corrected results.

📊 Evaluation Results

The overall performance of CSC test:

input_text	predict_text
Text correction: Young pioneers should give their seats to the elderly. (Original with errors)	Young pioneers should give their seats to the elderly. (Corrected)

📦 Models

Name	Base Model	Download
chinese-text-correction-1.5b	Qwen/Qwen2.5-1.5B-Instruct	🤗 Hugging Face
chinese-text-correction-1.5b-lora	Qwen/Qwen2.5-1.5B-Instruct	🤗 Hugging Face
chinese-text-correction-7b	Qwen/Qwen2.5-7B-Instruct	🤗 Hugging Face
chinese-text-correction-7b-lora	Qwen/Qwen2.5-7B-Instruct	🤗 Hugging Face

🔍 Detailed Evaluation Results

Evaluation Metric: F1
CSC (Chinese Spelling Correction): A spelling correction model that can handle errors such as similar-sounding, similar-shaped characters, and grammar errors with aligned lengths.
CTC (Chinese Text Correction): A text correction model that supports the correction of spelling and grammar errors with aligned lengths, as well as errors with unaligned lengths such as extra or missing characters.
GPU: Tesla V100, 32 GB显存

Model Name	Model Link	Base Model	Avg	SIGHAN - 2015	EC - LAW	MCSC	GPU/CPU	QPS
Kenlm - CSC	shibing624/chinese-kenlm-klm	kenlm	0.3409	0.3147	0.3763	0.3317	CPU	9
Mengzi - T5 - CSC	shibing624/mengzi-t5-base-chinese-correction	mengzi - t5 - base	0.3984	0.7758	0.3156	0.1039	GPU	214
ERNIE - CSC	PaddleNLP/ernie-csc	PaddlePaddle/ernie - 1.0 - base - zh	0.4353	0.8383	0.3357	0.1318	GPU	114
MacBERT - CSC	shibing624/macbert4csc-base-chinese	hfl/chinese - macbert - base	0.3993	0.8314	0.1610	0.2055	GPU	224
ChatGLM3 - 6B - CSC	shibing624/chatglm3-6b-csc-chinese-lora	THUDM/chatglm3 - 6b	0.4538	0.6572	0.4369	0.2672	GPU	3
Qwen2.5 - 1.5B - CTC	shibing624/chinese-text-correction-1.5b	Qwen/Qwen2.5 - 1.5B - Instruct	0.6802	0.3032	0.7846	0.9529	GPU	6
Qwen2.5 - 7B - CTC	shibing624/chinese-text-correction-7b	Qwen/Qwen2.5 - 7B - Instruct	0.8225	0.4917	0.9798	0.9959	GPU	3

💻 Usage Examples

Basic Usage with pycorrector

This project is open - sourced in the pycorrector project: pycorrector. It supports using fine - tuned large models for text correction. You can call it with the following commands:

Install package:

pip install -U pycorrector

from pycorrector.gpt.gpt_corrector import GptCorrector

if __name__ == '__main__':
    error_sentences = [
        '真麻烦你了。希望你们好好的跳无',
        '少先队员因该为老人让坐',
        '机七学习是人工智能领遇最能体现智能的一个分知',
        '一只小鱼船浮在平净的河面上',
        '我的家乡是有明的渔米之乡',
    ]
    m = GptCorrector("shibing624/chinese-text-correction-7b")

    batch_res = m.correct_batch(error_sentences)
    for i in batch_res:
        print(i)
        print()

Advanced Usage with HuggingFace Transformers

Without pycorrector, you can use the model like this:

First, you pass your input through the transformer model, then you get the generated sentence.

Install package:

pip install transformers

# pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "shibing624/chinese-text-correction-7b"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

input_content = "文本纠错：\n少先队员因该为老人让坐。"

messages = [{"role": "user", "content": input_content}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)

print(input_text)

inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)

print(tokenizer.decode(outputs[0]))

Output:

少先队员应该为老人让座。

Model File Structure

shibing624/chinese-text-correction-7b
|-- added_tokens.json
|-- config.json
|-- generation_config.json
|-- merges.txt
|-- model.safetensors
|-- model.safetensors.index.json
|-- README.md
|-- special_tokens_map.json
|-- tokenizer_config.json
|-- tokenizer.json
`-- vocab.json

Training Parameters

num_epochs: 8
batch_size: 2
steps: 36000
eval_loss: 0.12
base model: Qwen/Qwen2.5-7B-Instruct
train data: shibing624/chinese_text_correction
train time: 10 days
eval_loss:
train_loss:

Training Datasets

Chinese Text Correction Dataset

Data: shibing624/chinese_text_correction

If you need to train a Qwen text correction model, please refer to https://github.com/shibing624/pycorrector or https://github.com/shibing624/MedicalGPT

📄 License

This project is licensed under the apache - 2.0 license.

📚 Citation

@software{pycorrector,
  author = {Xu Ming},
  title = {pycorrector: Implementation of language model finetune},
  year = {2024},
  url = {https://github.com/shibing624/pycorrector},
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご