Chocolatine-2-14B-Instruct-v2.0.3 Open-Source Large Language Model - Excellently Skilled in Bilingual French and English Tasks

Chocolatine 2 14B Instruct V2.0.3

Developed by jpacifico

Chocolatine-2-14B-Instruct-v2.0.3 is a large language model based on the Qwen-2.5-14B architecture, fine-tuned with DPO, specializing in French and English tasks, and excels in the French LLM leaderboard.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #French optimization #128K long context #DPO fine-tuning

Downloads 329

Release Time : 2/6/2025

Model Overview

This model is a large language model primarily designed for text generation tasks in French and English. After DPO fine-tuning, its performance on French tasks approaches that of GPT-4o-mini.

Model Features

High-performance French processing

Ranked in the top three on the French LLM leaderboard, with excellent performance in French MT-Bench tests, approaching the capabilities of GPT-4o-mini.

Large context window

Supports a context window of up to 128K tokens, making it suitable for long-text tasks.

DPO fine-tuning

Fine-tuned using the jpacifico/french-orca-dpo-pairs-revised dataset, enhancing the model's overall capabilities.

Model Capabilities

Text generation

French task processing

English task processing

Use Cases

Education

French learning assistant

Helps students learn and practice French, providing grammar correction and language generation features.

Performs excellently in French MT-Bench tests.

Customer service

Multilingual customer service bot

Used to handle customer inquiries in French and English, providing fast and accurate responses.

🚀 Chocolatine-2-14B-Instruct-v2.0.3

This is a DPO fine-tuning of the merged model jpacifico/Chocolatine-2-merged-qwen25arch (Qwen-2.5-14B architecture), using the jpacifico/french-orca-dpo-pairs-revised RLHF dataset. Training in French also enhances the model's overall capabilities.

💡 Usage Tip

Window context: up to 128K tokens

✨ Features

LLM Leaderboard FR

[Updated 2025-04-25] It ranks among the top 3 in all categories on the French Government Leaderboard LLM FR. image/png

MT-Bench-French

Chocolatine-2 outperforms its previous versions and its base architecture Qwen-2.5 model on MT-Bench-French, used with multilingual-mt-bench and GPT-4-Turbo as a LLM-judge. The goal was to achieve GPT-4o-mini's performance on the French language, and this version comes close to the performance of the OpenAI model according to this benchmark.

########## First turn ##########
                                             score
model                                 turn        
gpt-4o-mini                           1     9.287500
Chocolatine-2-14B-Instruct-v2.0.3     1     9.112500
Qwen2.5-14B-Instruct                  1     8.887500
Chocolatine-14B-Instruct-DPO-v1.2     1     8.612500
Phi-3.5-mini-instruct                 1     8.525000
Chocolatine-3B-Instruct-DPO-v1.2      1     8.375000
DeepSeek-R1-Distill-Qwen-14B          1     8.375000
phi-4                                 1     8.300000
Phi-3-medium-4k-instruct              1     8.225000
gpt-3.5-turbo                         1     8.137500
Chocolatine-3B-Instruct-DPO-Revised   1     7.987500
Meta-Llama-3.1-8B-Instruct            1     7.050000
vigostral-7b-chat                     1     6.787500
Mistral-7B-Instruct-v0.3              1     6.750000
gemma-2-2b-it                         1     6.450000

########## Second turn ##########
                                               score
model                                 turn
Chocolatine-2-14B-Instruct-v2.0.3     2     9.050000         
gpt-4o-mini                           2     8.912500
Qwen2.5-14B-Instruct                  2     8.912500
Chocolatine-14B-Instruct-DPO-v1.2     2     8.337500
DeepSeek-R1-Distill-Qwen-14B          2     8.200000
phi-4                                 2     8.131250
Chocolatine-3B-Instruct-DPO-Revised   2     7.937500
Chocolatine-3B-Instruct-DPO-v1.2      2     7.862500
Phi-3-medium-4k-instruct              2     7.750000
gpt-3.5-turbo                         2     7.679167
Phi-3.5-mini-instruct                 2     7.575000
Meta-Llama-3.1-8B-Instruct            2     6.787500
Mistral-7B-Instruct-v0.3              2     6.500000
vigostral-7b-chat                     2     6.162500
gemma-2-2b-it                         2     6.100000

########## Average ##########
                                          score
model                                          
gpt-4o-mini                            9.100000
Chocolatine-2-14B-Instruct-v2.0.3      9.081250
Qwen2.5-14B-Instruct                   8.900000
Chocolatine-14B-Instruct-DPO-v1.2      8.475000
DeepSeek-R1-Distill-Qwen-14B           8.287500
phi-4                                  8.215625
Chocolatine-3B-Instruct-DPO-v1.2       8.118750
Phi-3.5-mini-instruct                  8.050000
Phi-3-medium-4k-instruct               7.987500
Chocolatine-3B-Instruct-DPO-Revised    7.962500
gpt-3.5-turbo                          7.908333
Meta-Llama-3.1-8B-Instruct             6.918750
Mistral-7B-Instruct-v0.3               6.625000
vigostral-7b-chat                      6.475000
gemma-2-2b-it                          6.275000

OpenLLM Leaderboard (Archived)

Chocolatine-2 is the best-performing 14B fine-tuned model (Ex-aequo with avg. score 41.08) on the OpenLLM Leaderboard. [Updated 2025-02-12]

Property	Details
Avg.	41.08
IFEval	70.37
BBH	50.63
MATH Lvl 5	40.56
GPQA	17.23
MuSR	19.07
MMLU-PRO	48.60

💻 Usage Examples

Basic Usage

You can run this model using my Colab notebook.

You can also run Chocolatine-2 using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

📚 Documentation

Limitations

The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanism.

Property	Details
Developed by	Jonathan Pacifico, 2025
Model Type	LLM
Training Data	French, English
License	Apache-2.0

Made with ❤️ in France

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご