Pleias - RAG - 350M Open - source Inference Model: Provide Practical Support for Retrieval, Search, and Summarization Tasks!

Pleias RAG 350M

Developed by PleIAs

Pleias-RAG-350M is a 350-million-parameter compact reasoning model specifically trained for retrieval-augmented generation (RAG), search, and source summarization tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Retrieval-Augmented Generation #Multilingual RAG #Automatic Citation Tracing

Downloads 292

Release Time : 4/7/2025

Model Overview

A small language model designed for retrieval-augmented generation (RAG), featuring automatic citation generation and multilingual support, suitable for deployment on resource-constrained devices.

Model Features

Native Citation Support

Automatically generates answers with Wikipedia-style citation markers, supporting citation abbreviation functionality

Multilingual Capability

Proficient in major European languages with stable performance in RAG tasks

RAG Reasoning Process

Features agent-like decision-making capabilities, including structured reasoning such as query analysis and source evaluation

Efficient Deployment

Compact size suitable for constrained devices like smartphones, with complex reasoning generation taking only about 20 seconds

Model Capabilities

Retrieval-Augmented Generation

Multilingual Text Generation

Automatic Citation Generation

Query Analysis

Source Evaluation

Use Cases

Customer Service

Intelligent Customer Service Responses

Generates accurate answers with source citations based on knowledge bases

Enhances answer credibility and simplifies source verification

Education

Learning Assistance

Generates explanations with citations based on textbook content

Helps students quickly locate original materials

🚀 Pleias-RAG-350m

Pleias-RAG-350M is a 350-million-parameter Small Reasoning Model. It's trained for retrieval-augmented general (RAG), search, and source summarization. Along with Pleias-RAG-1B, it belongs to the first generation of Pleias specialized reasoning models. This model outperforms most SLMs (4 billion parameters and below) on standardized benchmarks for retrieval-augmented general tasks and is a cost - effective alternative to popular larger models.

Full model report

🚀 Quick Start

The easiest way to deploy Pleias-RAG-350M is through our official library. It features an API-like workflow with standardized export of the structured reasoning/answer output into json format. A Colab Notebook is available for easy tests and experimentations.

✨ Features

Citation support

Pleias-RAG-350M natively generates grounded answers based on excerpts and citations extracted from the provided sources, using a custom syntax inspired by Wikipedia (). It is one of the few open weights models to date developed with this feature and the first one designed for actual deployment.

In contrast with the Anthropic approach (Citation mode), citations are integrally generated by the model and are not the product of external chunking. As a result, we can provide another desirable feature to simplify source checking: citation shortening for longer excerpts (using "(…)").

RAG reasoning

Pleias-RAG-350M generates specific reasoning sequences incorporating several proto - agentic abilities for RAG applications. The model can make a series of decisions directly:

Assessing whether the query is understandable.
Assessing whether the query is trivial enough to not require a lengthy pre - analysis (adjustable reasoning)
Assessing whether the sources contain enough input to generate a grounded answer.

The structured reasoning trace includes the following steps:

Language detection of the query. The model will always strive to answer in the language of the original query.
Query analysis and associated query report. The analysis can lead to a standard answer, a shortening reasoning trace/answer for trivial questions, a reformulated query, or a refusal (that could, in the context of the application, be transformed into user input querying).
Source analysis and associated source report. This step evaluates the coverage and depth of the provided sources regarding the query.
Draft of the final answer.

Multilinguality

Pleias-RAG-350M can read and write in the main European languages: French, German, Italian, Spanish, and, to a lesser extent, Polish, Latin, and Portuguese.

To date, it is the only small language model with negligible loss of performance in leading European languages for RAG - related tasks. On a translated set of HotPotQA, we observed a significant drop of performance in most SLMs from 10% to 30 - 35% for sub - 1B models.

We expect the results of any standard English evaluation on Pleias RAG models to be largely transferable to the main European languages, limiting the costs of evaluation and deployment in multilingual settings.

📦 Installation

The installation can be done through the official library. You can access it via our official library.

💻 Usage Examples

Basic Usage

rag = RAGWithCitations("PleIAs/Pleias-RAG-350M")

# Define query and sources
query = "What is the capital of France?"
sources = [
    {
        "text": "Paris is the capital and most populous city of France. With an estimated population of 2,140,526 residents as of January 2019, Paris is the center of the Île-de-France metropolitan area and the hub of French economic, political, and cultural life. The city's landmarks, including the Eiffel Tower, Arc de Triomphe, and Cathedral of Notre-Dame, make it one of the world's most visited tourist destinations.",
        "metadata": {"source": "Geographic Encyclopedia", "reliability": "high"}
    },
    {
        "text": "The Eiffel Tower is located in Paris, France. It was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair and was initially criticized by some of France's leading artists and intellectuals for its design. Standing at 324 meters (1,063 ft) tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York City in 1930. The tower receives about 7 million visitors annually and has become an iconic symbol of Paris and France.",
        "metadata": {"source": "Travel Guide", "year": 2020}
    }
]

# Generate a response
response = rag.generate(query, sources)

# Print the final answer with citations
print(response["processed"]["clean_answer"])

With expected output:

The capital of France is Paris. This can be confirmed by the fact that Paris is explicitly stated to be "the capital and most populous city of France" [1].

**Citations**
[1] "Paris is the capital and most populous city of France" [Source 1]

📚 Documentation

Training

Pleias-RAG-350M is trained on a large synthetic dataset emulating retrieval of a wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.

Evaluation

Pleias-RAG-350M has been evaluated on three standard RAG benchmarks, 2wiki, HotpotQA, and MuSique.

All the benchmarks only assess the "trivial" mode on questions requiring some form of multi - hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.

Pleias-RAG-350M is not simply a cost - effective version of larger models. We found it can answer correctly to several hundred questions from HotPotQA that neither Llama-3 - 8b nor Qwen-2.5 - 7b could solve. Consequently, we encourage its use as part of multi - model RAG systems.

📄 License

The license for this project is apache - 2.0.

Model Information

Property	Details
Base Model	PleIAs/Pleias-350m-Preview
Language	en, fr, it, de, es
License	apache-2.0
Pipeline Tag	text-generation
Tags	transformers
Library Name	transformers

Github repository: https://github.com/Pleias/Pleias-RAG-Library

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご