L

Legalbert Large 1.7M 1

Developed by pile-of-law
A BERT large model pretrained on English legal and administrative texts using RoBERTa pretraining objectives
Downloads 120
Release Time : 4/29/2022

Model Overview

This model adopts the BERT architecture and is specifically pretrained on the Pile of Law dataset, suitable for legal-related natural language processing tasks

Model Features

Legal Domain Specialization
Pretrained specifically on legal and administrative texts, with better understanding of legal terminology
Large-Scale Training Data
Pretrained using approximately 256GB of English legal and administrative texts
Optimized Tokenizer
Includes a vocabulary of 32,000 tokens, with 3,000 specifically being legal terms

Model Capabilities

Legal Text Understanding
Masked Language Modeling
Legal Text Classification
Legal Question Answering

Use Cases

Legal Document Processing
Legal Term Prediction
Predicting professional terms in legal texts
For example, correctly predicting 'appeal' as the most likely fill-in word
Legal Document Analysis
Analyzing legal document content
Legal Research Assistance
Case Retrieval Enhancement
Improving legal case retrieval systems
Featured Recommended AI Models
ยฉ 2025AIbase