Pleias-RAG-1B Open-Source Inference Model - Free Support for Search, Literature Summarization, and Multi-Language RAG Tasks

Pleias RAG 1B

Developed by PleIAs

Pleias-RAG-1B is a 1.2B-parameter compact reasoning model specifically designed for retrieval-augmented generation (RAG), search, and document summarization tasks. It excels in multilingual RAG tasks and supports structured citation generation.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual citation generation #Compact RAG specialist #Structured reasoning traces

Downloads 1,474

Release Time : 4/7/2025

Model Overview

A reasoning model designed for retrieval-augmented generation, supporting multilingual and structured citations, suitable for knowledge-intensive tasks.

Model Features

Native citation support

Automatically generates answers with Wikipedia-style citations, supporting long passage abbreviation functionality

RAG reasoning mechanism

Features agent-like decision-making capabilities to evaluate query comprehensibility, judge simple question responses, and verify document sufficiency

Multilingual capability

Negligible performance loss in European mainstream language RAG tasks, currently the only compact language model maintaining stable performance

Compact parameter design

Easy deployment on constrained devices (including mobile phones) while maintaining high performance

Model Capabilities

Retrieval-augmented generation

Multilingual text generation

Structured citation generation

Query analysis

Document coverage evaluation

Use Cases

Educational assistance

Academic research aid

Helps students quickly find and cite academic materials

Generates research answers with accurate citations

User support

Knowledge base Q&A

Provides accurate responses based on enterprise knowledge bases

Reduces manual customer service workload by 30-40%

🚀 Pleias-RAG-1B

Pleias-RAG-1B is a 1.2 billion parameters Small Reasoning Model. It is trained for retrieval-augmented general (RAG), search and source summarization. This model belongs to the first generation of Pleias specialized reasoning models. It can solve retrieval - augmented general problems and provides valuable solutions in search and source summarization scenarios.

Full model report

Pleias-RAG-1B outperforms most SLMs (4 billion parameters and below) on standardized benchmarks for retrieval-augmented general (HotPotQA, 2wiki). It is also competitive with standard 7 - 8b models such as Qwen-2.5-7B and Llama-3.1-8B. It is the only SLM to date that maintains consistent RAG performance across leading European languages and ensures systematic reference grounding for statements.

Due to its size, ease of deployment on constrained infrastructure (including mobile phones) and built - in support for factual and accurate information, Pleias-RAG-1B unlocks a range of new use cases for generative AI.

✨ Features

Pleias-RAG-1B is a specialized language model. It uses a series of special tokens to process a structured input (query and sources) and generate a structured output (reasoning sequence and answer with sources). For easier implementation, we encourage users to use the associated API library.

Citation support

Pleias-RAG-1B natively generates grounded answers based on excerpts and citations extracted from the provided sources. It uses a custom syntax inspired by Wikipedia (). It is one of the few open - weights models to date that has been developed with this feature and the first one designed for actual deployment.

In contrast with the Anthropic approach (Citation mode), citations are integrally generated by the model and are not the product of external chunking. As a result, we can provide another desirable feature to simplify source checking: citation shortening for longer excerpts (using "(…)").

RAG reasoning

Pleias-RAG-1B generates specific reasoning sequences incorporating several proto - agentic abilities for RAG applications. The model can make a series of decisions directly:

Assessing whether the query is understandable.
Assessing whether the query is trivial enough to not require a lengthy pre - analysis (adjustable reasoning)
Assessing whether the sources contain enough input to generate a grounded answer.

The structured reasoning traces include the following steps:

Language detection of the query. The model will always strive to answer in the language of the original query.
Query analysis and associated query report. The analysis can either lead to a standard answer, a shortening reasoning trace/answer for trivial questions, a reformulated query or a refusal (that could in the context of the application be transformed into user input querying).
Source analysis and associated source report. This step evaluates the coverage and depth of the provided sources in regards to the query.
Draft of the final answer.

Multilinguality

Pleias-RAG-1B can read and write in the main European languages: French, German, Italian, Spanish, Polish, Latin and Portuguese.

To date, it is the only SLM with negligible loss of performance in leading European languages for RAG - related tasks. On a translated set of HotPotQA, we observed a significant drop of performance in most SLMs from 10% to 30 - 35% for sub - 1B models.

We expect that the results of any standard English evaluation on Pleias RAG models should be largely transferable to the main European languages, limiting the costs of evaluation and deployment in multilingual settings.

📚 Documentation

Training

Pleias-RAG-1B is trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes. Following the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.

Evaluation

Pleias-RAG-1B has been evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique.

All the benchmarks only assess the "trivial" mode on questions requiring some form of multi - hop reasoning over sources (answer disseminated into different sources) as well as the discrimination of distractor sources.

📦 Installation

The easiest way to deploy Pleias-RAG-1B is through our official library. It features an API - like workflow with standardized export of the structured reasoning/answer output into json format. A Colab Notebook is available for easy tests and experimentations.

💻 Usage Examples

Basic Usage

from rag_library import RAGWithCitations

rag = RAGWithCitations("PleIAs/Pleias-RAG-1B")

# Define query and sources
query = "What is the capital of France?"
sources = [
    {
        "text": "Paris is the capital and most populous city of France. With an estimated population of 2,140,526 residents as of January 2019, Paris is the center of the Île-de-France dijon metropolitan area and the hub of French economic, political, and cultural life. The city's landmarks, including the Eiffel Tower, Arc de Triomphe, and Cathedral of Notre-Dame, make it one of the world's most visited tourist destinations.",
        "metadata": {"source": "Geographic Encyclopedia", "reliability": "high"}
    },
    {
        "text": "The Eiffel Tower is located in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair and was initially criticized by some of France's leading artists and intellectuals for its design. Standing at 324 meters (1,063 ft) tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York City in 1930. The tower receives about 7 million visitors annually and has become an iconic symbol of Paris and France.",
        "metadata": {"source": "Travel Guide", "year": 2020}
    }
]

# Generate a response
response = rag.generate(query, sources)

# Print the final answer with citations
print(response["processed"]["clean_answer"])

With expected output:

The capital of France is Paris. This is confirmed by multiple sources, with <|source_id|>1 explicitly stating that "Paris is the capital and most populous city of France"[1].

**Citations**
[1] "Paris is the capital and most populous city of France" [Source 1]

With 1.2B parameters, Pleias-RAG-1B can be readily deployed in many constrained infrastructures, including desktop systems on CPU RAM.

We also release an unquantized GGUF version for deployment on CPU. Our internal performance benchmarks suggest that waiting times are currently acceptable for most either even under constrained RAM: about 20 seconds for a complex generation including reasoning traces on 8g RAM and below. Since the model is unquantized, the quality of text generation should be identical to the original model.

Once integrated into a RAG system, Pleias-RAG-1B can also be used in a broader range of non - conversational use cases including user support or educational assistance. Through this release, we aim to make SLMs workable in production by relying systematically on an externalized memory.

📄 License

The license of this model is apache - 2.0.

Property	Details
Base Model	PleIAs/Pleias-1.2B-Preview
Language	en, fr, it, de, es
License	apache-2.0
Library Name	transformers
Pipeline Tag	text-generation

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご