C

Custom Legalbert

Developed by casehold
BERT model optimized for the legal domain, pretrained from scratch on 37GB of legal ruling texts
Downloads 12.59k
Release Time : 3/2/2022

Model Overview

A BERT variant specifically designed for legal texts, supporting masked language modeling and next sentence prediction tasks for legal documents

Model Features

Domain-Specific Legal Vocabulary
Optimized legal term processing through a custom vocabulary of 32,000 tokens
Large-Scale Legal Corpus Training
Pretrained using the complete Harvard Case Collection of 37GB/3,446,187 legal rulings
Domain-Adapted Preprocessing
Tokenization and sentence segmentation specifically optimized for legal text characteristics

Model Capabilities

Legal Text Understanding
Legal Document Classification
Legal Multiple-Choice Reasoning
Case Citation Analysis

Use Cases

Legal Research
Case Citation Prediction
Predict relevant case citations likely to be referenced in legal rulings
Achieved SOTA performance on the CaseHOLD dataset
Legal Clause Analysis
Parse key content in legal documents such as terms of service
Judicial Assistance
Ruling Document Generation
Assist in generating specific sections of legal ruling documents
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase