L

Legalbert Large 1.7M 2

Developed by pile-of-law
A RoBERTa model pretrained on English legal and administrative texts, specializing in language understanding tasks in the legal domain
Downloads 701
Release Time : 4/29/2022

Model Overview

This is a transformers model based on the BERT-large architecture, pretrained using the Pile of Law dataset (approximately 256GB of English legal texts), suitable for legal-related downstream tasks

Model Features

Legal Domain Specialization
Specifically pretrained on legal and administrative texts, incorporating legal terminology and expressions
RoBERTa Pretraining Objective
Utilizes RoBERTa's masked language modeling objective, optimizing traditional BERT training methods
Large-scale Training Data
Trained on approximately 256GB of the Pile of Law dataset, containing 35 types of legal-related data sources
Legal Text Optimization
Uses LexNLP sentence splitter to process legal citations, optimizing the preprocessing workflow for legal texts

Model Capabilities

Legal Text Understanding
Masked Language Modeling
Legal Document Analysis
Legal Terminology Recognition

Use Cases

Legal Text Processing
Legal Clause Completion
Automatically completes missing parts in legal documents
Example correctly predicts legal terms such as 'An exception is a request...'
Legal Document Classification
Automatically classifies legal documents
Legal Research Assistance
Legal Concept Explanation
Explains legal terms and concepts
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase