Nous-Hermes-2-Mistral-7B-DPO-AWQ Open-Source AI Model - Outstanding Performance after Optimization Testing, Really Useful!

Nous Hermes 2 Mistral 7B DPO AWQ

Developed by solidrust

Nous Hermes 2 is a next-generation flagship 7B Hermes model based on Mistral 7B DPO, optimized with DPO and demonstrating excellent performance across multiple benchmarks.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #GPT4-level conversation #DPO optimization #7B lightweight

Downloads 84

Release Time : 2/22/2024

Model Overview

This model is a large language model based on the Mistral 7B architecture, trained with DPO (Direct Preference Optimization), focusing on instruction following and dialogue generation tasks.

Model Features

DPO optimization

Trained with Direct Preference Optimization, demonstrating superior performance on benchmarks like AGIEval and BigBench Reasoning

High-quality training data

Trained with 1 million GPT-4 quality or better instruction/dialogue data points

AWQ quantization support

Supports 4-bit AWQ quantization, improving inference efficiency while maintaining quality

ChatML format support

Uses standardized ChatML prompt templates for easy integration with dialogue systems

Model Capabilities

Text generation

Dialogue systems

Instruction following

Reasoning capabilities

Use Cases

Dialogue systems

Intelligent assistant

Building AI assistants capable of understanding complex instructions and generating natural responses

Outperforms base models on multiple benchmarks

Educational applications

Teaching aid

Used for generating educational content and answering student questions

🚀 Nous Hermes 2 - Mistral 7B - DPO

This model is a text - generation model created by NousResearch. It is based on the OpenHermes Mistral 2.5 7B DPO model, offering enhanced performance in various benchmarks.

✨ Features

Based on High - Quality Data: Trained on 1,000,000 instructions/chats of GPT - 4 quality or better, mainly using synthetic data and other high - quality datasets from teknium/OpenHermes - 2.5.
Improved Performance: After DPO, it has improved across the board on all tested benchmarks such as AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA.
AWQ Quantization: Utilizes the efficient, accurate and fast AWQ low - bit weight quantization method, currently supporting 4 - bit quantization.

📦 Installation

Install the necessary packages

pip install --upgrade autoawq autoawq-kernels

💻 Usage Examples

Basic Usage

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

model_path = "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ"
system_message = "You are Hermes, incarnated a powerful AI."

# Load model
model = AutoAWQForCausalLM.from_quantized(model_path,
                                          fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)
streamer = TextStreamer(tokenizer,
                        skip_prompt=True,
                        skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = """\
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""

prompt = "You're standing on the surface of the Earth. "\
        "You walk one mile south, one mile west and one mile north. "\
        "You end up exactly where you started. Where are you?"

tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
                  return_tensors='pt').input_ids.cuda()

# Generate output
generation_output = model.generate(tokens,
                                  streamer=streamer,
                                  max_new_tokens=512)

📚 Documentation

Model Information

Property	Details
Model Creator	NousResearch
Original Model	OpenHermes Mistral 2.5 7B DPO
Base Model	teknium/OpenHermes - 2.5 - Mistral - 7B
Quantized By	Suparious
Pipeline Tag	text - generation
License	apache - 2.0
Prompt Template	'<

About AWQ

AWQ is an efficient, accurate and blazing - fast low - bit weight quantization method, currently supporting 4 - bit quantization. Compared to GPTQ, it offers faster Transformers - based inference with equivalent or better quality compared to the most commonly used GPTQ settings.

AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.

It is supported by:

[Text Generation Webui](https://github.com/oobabooga/text - generation - webui) - using Loader: AutoAWQ
[vLLM](https://github.com/vllm - project/vllm) - version 0.2.2 or later for support for all model types.
[Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text - generation - inference)
Transformers version 4.35.0 and later, from any code or client that supports Transformers
[AutoAWQ](https://github.com/casper - hansen/AutoAWQ) - for use from Python code

Prompt template: ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Citation

@misc{Nous-Hermes-2-Mistral-7B-DPO, 
      url={[https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO)}, 
      title={Nous Hermes 2 Mistral 7B DPO}, 
      author={"Teknium", "theemozilla", "karan4d", "huemin_art"}
}

Acknowledgment

Thank you to FluidStack for sponsoring compute for this model.

image/png

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご