Mistral-NeMo-Minitron-8B-Base Open-Source Text Generation Model - Suitable for Various Natural Language Tasks

Mistral NeMo Minitron 8B Base

Developed by nvidia

Mistral-NeMo-Minitron-8B-Base is a basic text generation model obtained by pruning and distilling Mistral-NeMo 12B, suitable for various natural language generation tasks.

Large Language Model

Transformers

Open Source License:Other #Pruned and distilled model #Multi-task text generation #GQA attention mechanism

Downloads 7,924

Release Time : 8/19/2024

Model Overview

This model is a large language model (LLM) obtained by pruning and distilling Mistral-NeMo 12B. It can effectively handle various text tasks and provide strong support for natural language processing.

Model Features

Pruning and distillation optimization

Obtained by pruning and distilling Mistral-NeMo 12B, reducing the model size while maintaining performance

Efficient training

After pruning, use 380 billion tokens for distillation-based continuous training to optimize model performance

Advanced architecture

Adopt advanced technologies such as Grouped Query Attention (GQA) and Rotary Position Embedding (RoPE)

Model Capabilities

Text generation

Paragraph completion

Language understanding

Code generation

Use Cases

Content generation

Paragraph completion

Complete the paragraph content according to the prompt

Generate coherent subsequent text

Education

Science knowledge Q&A

Answer questions in the scientific field

Provide accurate scientific knowledge explanations

🚀 Mistral-NeMo-Minitron-8B-Base

Mistral-NeMo-Minitron-8B-Base is a base text-to-text model. It can be used for various natural language generation tasks, offering a practical solution for language processing needs.

🚀 Quick Start

Support for this model will be added in the upcoming transformers release. In the meantime, please install the library from source:

pip install git+https://github.com/huggingface/transformers

We can now run inference on this model:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
model_path = "nvidia/Mistral-NeMo-Minitron-8B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = 'cuda'
dtype = torch.bfloat16
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype, device_map=device)

# Prepare the input text
prompt = 'Complete the paragraph: our solar system is'
inputs = tokenizer.encode(prompt, return_tensors='pt').to(model.device)

# Generate the output
outputs = model.generate(inputs, max_length=20)

# Decode and print the output
output_text = tokenizer.decode(outputs[0])
print(output_text)

✨ Features

Mistral-NeMo-Minitron-8B-Base is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B. It can be adopted for a variety of natural language generation tasks.

📦 Installation

Support for this model will be added in the upcoming transformers release. In the meantime, please install the library from source:

pip install git+https://github.com/huggingface/transformers

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
model_path = "nvidia/Mistral-NeMo-Minitron-8B-Base"
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = 'cuda'
dtype = torch.bfloat16
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype, device_map=device)

# Prepare the input text
prompt = 'Complete the paragraph: our solar system is'
inputs = tokenizer.encode(prompt, return_tensors='pt').to(model.device)

# Generate the output
outputs = model.generate(inputs, max_length=20)

# Decode and print the output
output_text = tokenizer.decode(outputs[0])
print(output_text)

📚 Documentation

Model Overview

Mistral-NeMo-Minitron-8B-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks. It is a large language model (LLM) obtained by pruning and distilling the Mistral-NeMo 12B; specifically, we prune the embedding dimension and MLP intermediate dimension in the model. Following pruning, we perform continued training with distillation using 380 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our technical report for more details.

Model Developer: NVIDIA

Model Dates: Mistral-NeMo-Minitron-8B-Base was trained between July 24, 2024 and August 10, 2024.

Model Architecture

Mistral-NeMo-Minitron-8B-Base uses a model embedding size of 4096, 32 attention heads, MLP intermediate dimension of 11520, with 40 layers in total. Additionally, it uses Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE).

Property	Details
Architecture Type	Transformer Decoder (Auto-Regressive Language Model)
Network Architecture	Mistral-NeMo
Input Type(s)	Text
Input Format(s)	String
Input Parameters	One Dimensional (1D)
Other Properties Related to Input	Works well within 8k characters or less.
Output Type(s)	Text
Output Format	String
Output Parameters	1D
Other Properties Related to Output	None

Software Integration

Runtime Engine(s):

NeMo 24.05

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Lovelace

Operating System(s):

Linux

Dataset & Training

Data Collection Method by Dataset: Automated

Labeling Method by Dataset: Not Applicable

Properties: The training corpus for Mistral-NeMo-Minitron-8B-Base consists of English and multilingual text, as well as code. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. In our continued training set, we introduce a small portion of question-answering, and alignment style data to improve model performance.

Data Freshness: Training was done in 2024, the pretraining data has a cutoff of June 2023.

Evaluation Results

5-shot performance. Language Understanding evaluated using Massive Multitask Language Understanding:

Average
69.5

Zero-shot performance. Evaluated using select datasets from the LM Evaluation Harness with additions:

HellaSwag	Winogrande	GSM8K	ARC-Challenge	XLSum
83.0	80.4	58.5	64.4	32.0

Code generation performance. Evaluated using MBPP:

Score
43.77

Inference

Engine: TensorRT-LLM

Test Hardware: NVIDIA A100

DType: BFloat16

Limitations

The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

References

📄 License

This model is released under the NVIDIA Open Model License Agreement.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご