🚀 LettuceDetect: Hallucination Detection Model
LettuceDetect is a transformer - based model designed for hallucination detection in Retrieval - Augmented Generation (RAG) applications, leveraging the extended context support of ModernBERT.
Model Name: lettucedect-large-modernbert-en-v1
Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect
🚀 Quick Start
LettuceDetect is a powerful tool for detecting hallucinations in context - answer pairs. It's built on ModernBERT, which supports up to 8192 tokens, making it suitable for processing detailed and extensive documents.
✨ Features
- Extended Context Support: Based on ModernBERT, it can handle up to 8192 tokens, crucial for tasks requiring in - depth document processing.
- Accurate Detection: Trained to identify hallucinated tokens in answers, providing span - level predictions.
- High Performance: Outperforms many existing models in both example - level and span - level evaluations.
📦 Installation
Install the 'lettucedetect' repository:
pip install lettucedetect
💻 Usage Examples
Basic Usage
from lettucedetect.models.inference import HallucinationDetector
detector = HallucinationDetector(
method="transformer", model_path="KRLabsOrg/lettucedect-base-modernbert-en-v1"
)
contexts = ["France is a country in Europe. The capital of France is Paris. The population of France is 67 million.",]
question = "What is the capital of France? What is the population of France?"
answer = "The capital of France is Paris. The population of France is 69 million."
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)
📚 Documentation
Model Details
Property |
Details |
Model Type |
ModernBERT (Large) with extended context support (up to 8192 tokens) |
Task |
Token Classification / Hallucination Detection |
Training Data |
RagTruth |
Language |
English |
How It Works
The model is trained to identify tokens in the answer text that are not supported by the given context. During inference, it returns token - level predictions which are then aggregated into spans, allowing users to see exactly which parts of the answer are considered hallucinated.
Performance
Example level results
We evaluate our model on the test set of the RAGTruth dataset. Our large model, lettucedetect - large - v1, achieves an overall F1 score of 79.22%, outperforming prompt - based methods like GPT - 4 (63.4%) and encoder - based models like Luna (65.4%). It also surpasses fine - tuned LLAMA - 2 - 13B (78.7%) (presented in RAGTruth) and is competitive with the SOTA fine - tuned LLAMA - 3 - 8B (83.9%) (presented in the RAG - HAT paper).
Span - level results
At the span level, our model achieves the best scores across all data types, significantly outperforming previous models. Note that here we don't compare to models, like RAG - HAT, since they have no span - level evaluation presented.
📄 License
This project is licensed under the MIT license.
📖 Citing
If you use the model or the tool, please cite the following paper:
@misc{Kovacs:2025,
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
author={Ádám Kovács and Gábor Recski},
year={2025},
eprint={2502.17125},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.17125},
}