Ablation-141-a128.dpo.armorm.rp-shisa-v2-llama-3.1-8b Open-source Language Model - Free Deployment for Efficient Text Generation

Home

Ablation 141 A128.dpo.armorm.rp Shisa V2 Llama 3.1 8b

Developed by shisa-ai

Language model fine-tuned using DPO method, suitable for text generation tasks

Large Language Model

Transformers

#DPO optimization #Chinese dialogue generation #Reinforcement learning fine-tuning

Downloads 38

Release Time : 4/3/2025

Model Overview

This model is a fine-tuned version based on the LLaMA architecture, trained using the TRL framework and DPO method, focusing on text generation tasks.

Model Features

DPO training method

Trained using Direct Preference Optimization (DPO) method to improve language model generation quality

Based on LLaMA architecture

Built upon the powerful LLaMA-3.1-8B base model

TRL framework training

Trained using Hugging Face's TRL (Transformer Reinforcement Learning) framework

Model Capabilities

Text generation

Dialogue systems

Creative writing

Use Cases

Dialogue systems

Open-domain dialogue

Engage in natural and fluent conversational exchanges with users

Generates natural language responses aligned with human preferences

Creative writing

Story generation

Generate coherent storylines based on prompts

🚀 Model Card for outputs/ablation-141-a128.dpo.armorm.rp-shisa-v2-llama-3.1-8b

This model is a fine - tuned language model. It is based on an unspecified base model from None and is trained using TRL, offering enhanced performance in text - related tasks.

🚀 Quick Start

Here is a simple example to show you how to use this model for text generation:

Basic Usage

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

✨ Features

Fine - tuned Model: Built upon an unspecified base model, it has been fine - tuned to better suit specific tasks.
DPO Training: Trained using the Direct Preference Optimization (DPO) method, which is introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

📦 Installation

The library used for this model is transformers. You can install it using the following command:

pip install transformers

🔧 Technical Details

Training Procedure

This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

Framework Versions

TRL: 0.15.1
Transformers: 4.50.0
Pytorch: 2.6.0
Datasets: 3.4.1
Tokenizers: 0.21.1

📄 License

The model is released under the license.

📚 Documentation

Citations

If you use DPO in your work, please cite it as:

@inproceedings{rafailov2023direct,
    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
    author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
    year         = 2023,
    booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
    url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}

If you use TRL in your work, please cite it as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご