🚀 DISC-LawLLM
This repository contains DISC-LawLLM, which uses the Baichuan-13b-base version as the base model. It's a large language model specialized in the Chinese legal domain, aiming to provide comprehensive intelligent legal services.
⚠️ Important Note
Please note that due to the ongoing development of the project, the model weights in this repository may differ from those in our currently deployed demo.
✨ Features
DISC-LawLLM, developed and open-sourced by Data Intelligence and Social Computing Lab of Fudan University (Fudan-DISC), offers the following advantages:
- Legal Texts Generic Processing Capability
- Legal Thinking and Reasoning
- Legal knowledge Retrieval Capacity
In addition, the project's contributions include:
- High-quality SFT datasets and effective training paradigms
- Chinese legal LLMs evaluation framework
Check our HOME for more information.
📦 DISC-Law-SFT Dataset
We construct a high-quality supervised fine-tuning dataset, DISC-Law-SFT, with two subsets, namely DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. Our dataset covers a range of legal tasks, including legal information extraction, judgment prediction, document summarization, and legal question answering, ensuring coverage of diverse scenarios.
Dataset |
Task/Source |
Size |
Scenario |
DISC-LawLLM-SFT-Pair |
Legal information extraction |
32K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal event detection |
27K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal case classification |
20K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal judgement prediction |
11K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal case matching |
8K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal text summarization |
9K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Judicial public opinion summarization |
6K |
Legal professional assistant |
DISC-LawLLM-SFT-Pair |
Legal question answering |
93K |
Legal consultation services |
DISC-LawLLM-SFT-Pair |
Legal reading comprehension |
38K |
Judicial examination assistant |
DISC-LawLLM-SFT-Pair |
Judicial examination |
12K |
Judicial examination assistant |
DISC-LawLLM-SFT-Triple |
Legal judgement prediction |
16K |
Legal professional assistant |
DISC-LawLLM-SFT-Triple |
Legal question answering |
23K |
Legal consultation services |
General |
Alpaca-GPT4 |
48K |
General scenarios |
General |
Firefly |
60K |
General scenarios |
Total |
- |
403K |
- |
💻 Usage Examples
Basic Usage
>>>import torch
>>>>>>from transformers import AutoModelForCausalLM, AutoTokenizer
>>>from transformers.generation.utils import GenerationConfig
>>>tokenizer = AutoTokenizer.from_pretrained("ShengbinYue/DISC-LawLLM", use_fast=False, trust_remote_code=True)
>>>model = AutoModelForCausalLM.from_pretrained("ShengbinYue/DISC-LawLLM", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
>>>model.generation_config = GenerationConfig.from_pretrained("ShengbinYue/DISC-LawLLM")
>>>messages = []
>>>messages.append({"role": "user", "content": "生产销售假冒伪劣商品罪如何判刑?"})
>>>response = model.chat(tokenizer, messages)
>>>print(response)
📚 Documentation
Disclaimer
DISC-LawLLM comes with issues and limitations that current LLMs have yet to overcome. While it can provide Chinese legal services in many a wide variety of tasks and scenarios, the model should be used for reference purposes only and cannot replace professional lawyers and legal experts. We encourage users of DISC-LawLLM to evaluate the model critically. We do not take responsibility for any issues, risks, or adverse consequences that may arise from the use of DISC-LawLLM.
Citation
If our work is helpful for you, please kindly cite our work as follows:
@misc{yue2023disclawllm,
title={DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services},
author={Shengbin Yue and Wei Chen and Siyuan Wang and Bingxuan Li and Chenchen Shen and Shujun Liu and Yuxuan Zhou and Yao Xiao and Song Yun and Wei Lin and Xuanjing Huang and Zhongyu Wei},
year={2023},
eprint={2309.11325},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{yue2024lawllm,
title={LawLLM: Intelligent Legal System with Legal Reasoning and Verifiable Retrieval},
author={Yue, Shengbin and Liu, Shujun and Zhou, Yuxuan and Shen, Chenchen and Wang, Siyuan and Xiao, Yao and Li, Bingxuan and Song, Yun and Shen, Xiaoyu and Chen, Wei and others},
booktitle={International Conference on Database Systems for Advanced Applications},
pages={304--321},
year={2024},
organization={Springer}
}
📄 License
The use of the source code in this repository complies with the Apache 2.0 License.