Yayi-7B, an open-source Chinese large language model, can be deployed for free and supports hundreds of instruction tasks in multiple domains.

Yayi 7b

Developed by wenge-research

The YaYi Large Model is a Chinese large language model fine-tuned based on millions of high-quality domain data, covering hundreds of natural language instruction tasks in multiple domains.

Large Language Model

Transformers

Supports Multiple Languages#Fine-tuning in the Chinese domain #Multi-task instruction processing #Optimization of public opinion analysis

Downloads 939

Release Time : 6/2/2023

Model Overview

The YaYi Large Model enhances Chinese basic and domain analysis capabilities, supports various natural language processing tasks, and has plugin capabilities.

Model Features

Training with multi-domain data

Instruction fine-tuning is carried out on millions of high-quality domain data constructed manually, covering five major domains such as media promotion, public opinion analysis, public safety, financial risk control, and urban governance.

Capability enhancement

Enhanced Chinese basic capabilities and domain analysis capabilities, and added some plugin capabilities.

Continuous optimization

Through continuous manual feedback optimization during the internal testing process of hundreds of users, the model performance and security have been improved.

Model Capabilities

Text generation

Domain analysis

Multi-round dialogue

Plugin support

Use Cases

Media promotion

Press release generation

Automatically generate press releases based on key information

Public opinion analysis

Public opinion report generation

Analyze public opinion data and generate reports

Public safety

Safety warning analysis

Analyze safety-related data and generate warnings

🚀 YaYi Large Model

YaYi Large Model is fine - tuned on high - quality domain data, aiming to contribute to the Chinese pre - trained large model community and build an ecosystem with partners.

🚀 Quick Start

Here is a simple example code for invoking yayi-7b for downstream task inference. It can run on a single GPU such as A100/A800/3090 and occupies approximately 20GB of GPU memory when performing inference with FP16 precision. If you need to obtain training data or fine - tune the model based on yayi-7b, please refer to our 💻Github Repo.

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

yayi_7b_path = "wenge-research/yayi-7b"
tokenizer = AutoTokenizer.from_pretrained(yayi_7b_path)
model = AutoModelForCausalLM.from_pretrained(yayi_7b_path, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "你好"
formatted_prompt = f"<|System|>:\nA chat between a human and an AI assistant named YaYi.\nYaYi is a helpful and harmless language model developed by Beijing Wenge Technology Co.,Ltd.\n\n<|Human|>:\n{prompt}\n\n<|YaYi|>:"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

eos_token_id = tokenizer("<|End|>").input_ids[0]
generation_config = GenerationConfig(
    eos_token_id=eos_token_id,
    pad_token_id=eos_token_id,
    do_sample=True,
    max_new_tokens=100,
    temperature=0.3,
    repetition_penalty=1.1,
    no_repeat_ngram_size=0
)
response = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(response[0]))

⚠️ Important Note

A special token <|End|> was added as an end - of - sequence marker during model training. Therefore, in the GenerationConfig provided above, you should set eos_token_id to the token id corresponding to this end - of - sequence marker.

✨ Features

YaYi Large Model is obtained through instruction fine - tuning on millions of artificially constructed high - quality domain data. The training data covers five major domains, including media publicity, public opinion analysis, public safety, financial risk control, and urban governance, as well as hundreds of natural language instruction tasks.

During the iterative process from pre - training initialization weights to the domain model, we gradually enhanced its basic Chinese language capabilities and domain analysis capabilities, and added some plug - in capabilities. Meanwhile, through continuous manual feedback optimization during the internal testing by hundreds of users, we further improved the model's performance and security.

By open - sourcing the YaYi model, we contribute to the development of the Chinese pre - trained large language model open - source community and build the YaYi model ecosystem with every partner.

📄 License

Limitations

The SFT model trained based on the current data and base model still has the following performance issues:

It may generate factually incorrect responses for factual instructions.
It has difficulty effectively identifying harmful instructions and may generate harmful content.
Its capabilities in scenarios such as logical reasoning, code generation, and multi - turn conversations still need improvement.

Disclaimer

Due to the above - mentioned model limitations, we require developers to use the open - sourced code, data, models, and derivatives of this project only for research purposes, not for commercial use or any other use that may harm society. Please carefully evaluate and use the content generated by the YaYi model and do not spread harmful content on the Internet. Any adverse consequences shall be borne by the disseminator.

This project is only for research purposes, and the project developers are not responsible for any harm or losses caused by using this project (including but not limited to data, models, code, etc.). For details, please refer to the Disclaimer.

Open - Source License

The code in this project is open - sourced under the Apache - 2.0 license, the data uses the CC BY - NC 4.0 license, and the use of YaYi series model weights needs to follow the Model License.

🔧 Technical Details

Acknowledgements

This project uses the model weights of BigScience's [bloomz - 7b1 - mt](https://huggingface.co/bigscience/bloomz - 7b1 - mt) and Meta's [Llama 2](https://huggingface.co/meta - llama) series as initialization weights and expands the vocabulary.
The training code of this project refers to Databricks' dolly project and Huggingface's transformers library.
The distributed training of this project uses Microsoft's DeepSpeed distributed training tool and the [ZeRO stage 2](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero2 - config) configuration file in the Huggingface transformers documentation.

Property	Details
Model Type	YaYi Large Model
Training Data	Covers five major domains: media publicity, public opinion analysis, public safety, financial risk control, and urban governance, with hundreds of natural language instruction tasks

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご