🚀 WisdomOcean - WisdomInterrogatory
WisdomOcean - WisdomInterrogatory is a legal large - language model jointly developed by Zhejiang University, Alibaba DAMO Academy, and Huayuan Computing. It aims to promote the application of legal intelligence in judicial practice, digital case construction, and virtual legal consultation services.
🚀 Quick Start
WisdomOcean - WisdomInterrogatory is a legal large model jointly designed and developed by Zhejiang University, Alibaba DAMO Academy, and Huayuan Computing. Its core idea is to aim at "popularizing law sharing and improving judicial efficiency", and provide support in aspects such as promoting the integration of the legal intelligent system into judicial practice, digital case construction, and empowering virtual legal consultation services, so as to form digital and intelligent judicial base capabilities.
✨ Features
- Developed by multiple well - known institutions including Zhejiang University, Alibaba DAMO Academy, and Huayuan Computing.
- Aims to promote legal intelligence in various judicial - related fields.
- Based on a well - known open - source model and fine - tuned for legal scenarios.
📦 Installation
Inference Environment Installation
transformers>=4.27.1
accelerate>=0.20.1
torch>=2.0.1
modelscope>=1.8.3
sentencepiece==0.1.99
💻 Usage Examples
Basic Usage
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
import torch
model_id = "wisdomOcean/wisdomInterrogatory"
revision = 'v1.0.0'
model_dir = snapshot_download(model_id, revision)
def generate_response(prompt: str) -> str:
inputs = tokenizer(f'</s>Human:{prompt} </s>Assistant: ', return_tensors='pt')
inputs = inputs.to('cuda')
pred = model.generate(**inputs, max_new_tokens=800,
repetition_penalty=1.2)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)
return response.split("Assistant: ")[1]
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True)
prompt = "如果喝了两斤白酒后开车,会有什么后果?"
resp = generate_response(prompt)
print(resp)
📚 Documentation
Model Training
Our model is based on [Baichuan - 7B](https://github.com/baichuan - inc/baichuan - 7B). On this basis, we conducted secondary pre - training and instruction fine - tuning training.
Secondary Pre - training
The purpose of secondary pre - training is to inject legal knowledge into the general large model. The pre - training data includes legal documents, judicial cases, and legal Q&A data, totaling 40G.
Instruction Fine - Tuning Training
After secondary pre - training, in the instruction fine - tuning stage, we used 100k instruction fine - tuning training. The purpose is to enable the large model to have the ability to answer questions and communicate directly with users.
📄 License
The license of this project is other.
⚠️ Important Note
This model is provided for academic research purposes only. There is no guarantee of the accuracy, completeness, or applicability of the results. When using the content generated by the model, you should judge its applicability on your own and bear the risks yourself.