🚀 YaYi Large Model
YaYi Large Model is fine - tuned on high - quality domain data, aiming to contribute to the Chinese pre - trained large model community and build an ecosystem with partners.
🚀 Quick Start
Here is a simple example code for invoking yayi-7b
for downstream task inference. It can run on a single GPU such as A100/A800/3090 and occupies approximately 20GB of GPU memory when performing inference with FP16 precision. If you need to obtain training data or fine - tune the model based on yayi-7b
, please refer to our 💻Github Repo.
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
yayi_7b_path = "wenge-research/yayi-7b"
tokenizer = AutoTokenizer.from_pretrained(yayi_7b_path)
model = AutoModelForCausalLM.from_pretrained(yayi_7b_path, device_map="auto", torch_dtype=torch.bfloat16)
prompt = "你好"
formatted_prompt = f"<|System|>:\nA chat between a human and an AI assistant named YaYi.\nYaYi is a helpful and harmless language model developed by Beijing Wenge Technology Co.,Ltd.\n\n<|Human|>:\n{prompt}\n\n<|YaYi|>:"
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
eos_token_id = tokenizer("<|End|>").input_ids[0]
generation_config = GenerationConfig(
eos_token_id=eos_token_id,
pad_token_id=eos_token_id,
do_sample=True,
max_new_tokens=100,
temperature=0.3,
repetition_penalty=1.1,
no_repeat_ngram_size=0
)
response = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(response[0]))
⚠️ Important Note
A special token <|End|>
was added as an end - of - sequence marker during model training. Therefore, in the GenerationConfig
provided above, you should set eos_token_id
to the token id corresponding to this end - of - sequence marker.
✨ Features
YaYi Large Model is obtained through instruction fine - tuning on millions of artificially constructed high - quality domain data. The training data covers five major domains, including media publicity, public opinion analysis, public safety, financial risk control, and urban governance, as well as hundreds of natural language instruction tasks.
During the iterative process from pre - training initialization weights to the domain model, we gradually enhanced its basic Chinese language capabilities and domain analysis capabilities, and added some plug - in capabilities. Meanwhile, through continuous manual feedback optimization during the internal testing by hundreds of users, we further improved the model's performance and security.
By open - sourcing the YaYi model, we contribute to the development of the Chinese pre - trained large language model open - source community and build the YaYi model ecosystem with every partner.
📄 License
Limitations
The SFT model trained based on the current data and base model still has the following performance issues:
- It may generate factually incorrect responses for factual instructions.
- It has difficulty effectively identifying harmful instructions and may generate harmful content.
- Its capabilities in scenarios such as logical reasoning, code generation, and multi - turn conversations still need improvement.
Disclaimer
Due to the above - mentioned model limitations, we require developers to use the open - sourced code, data, models, and derivatives of this project only for research purposes, not for commercial use or any other use that may harm society. Please carefully evaluate and use the content generated by the YaYi model and do not spread harmful content on the Internet. Any adverse consequences shall be borne by the disseminator.
This project is only for research purposes, and the project developers are not responsible for any harm or losses caused by using this project (including but not limited to data, models, code, etc.). For details, please refer to the Disclaimer.
Open - Source License
The code in this project is open - sourced under the Apache - 2.0 license, the data uses the CC BY - NC 4.0 license, and the use of YaYi series model weights needs to follow the Model License.
🔧 Technical Details
Acknowledgements
- This project uses the model weights of BigScience's [bloomz - 7b1 - mt](https://huggingface.co/bigscience/bloomz - 7b1 - mt) and Meta's [Llama 2](https://huggingface.co/meta - llama) series as initialization weights and expands the vocabulary.
- The training code of this project refers to Databricks' dolly project and Huggingface's transformers library.
- The distributed training of this project uses Microsoft's DeepSpeed distributed training tool and the [ZeRO stage 2](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero2 - config) configuration file in the Huggingface transformers documentation.
Property |
Details |
Model Type |
YaYi Large Model |
Training Data |
Covers five major domains: media publicity, public opinion analysis, public safety, financial risk control, and urban governance, with hundreds of natural language instruction tasks |