Llama-3.1-Minitron-4B-Width-Base Open Source Text Model - Free Deployment to Support Various Language Generation Tasks

Llama 3.1 Minitron 4B Width Base

Developed by nvidia

Llama-3.1-Minitron-4B-Width-Base is a foundational text-to-text model obtained by pruning Llama-3.1-8B, suitable for various natural language generation tasks.

Large Language Model

Transformers

EnglishOpen Source License:Other #Pruned and Distilled Model #Multi-domain Text Generation #GQA Efficient Inference

Downloads 10.15k

Release Time : 8/13/2024

Model Overview

This model was obtained by pruning Llama-3.1-8B, reducing the embedding size and MLP intermediate dimensions, and undergoing continuous distillation training with 94 billion tokens to produce the final model.

Model Features

Pruning Optimization

Optimized the model's embedding size and MLP intermediate dimensions by pruning Llama-3.1-8B, improving efficiency.

Distillation Training

Enhanced model performance through continuous distillation training with 94 billion tokens.

Commercial Use

The model is ready for commercial use and suitable for various natural language generation tasks.

Model Capabilities

Text Generation

Natural Language Understanding

Code Generation

Use Cases

Natural Language Processing

Text Completion

Used for automatically completing paragraphs or sentences to enhance writing efficiency.

Performs best within 8000 characters.

Question Answering System

Used to build question-answering systems that respond to user queries.

Performs well in zero-shot evaluations.

Code Generation

Code Completion

Used for automatically completing code snippets to improve development efficiency.

Scores 32.0 in the MBPP evaluation.

🚀 Llama-3.1-Minitron-4B-Width-Base

Llama-3.1-Minitron-4B-Width-Base is a base text - to - text model. It can be applied to various natural language generation tasks, offering a practical solution for commercial use.

✨ Features

Derived from Llama - 3.1: Obtained by pruning Llama - 3.1 - 8B, specifically pruning the model embedding size and MLP intermediate dimension.
Continued Training with Distillation: After pruning, continued training with distillation is performed using 94 billion tokens from the continuous pre - training data corpus of Nemotron - 4 15B.
Model Architecture: Utilizes a model embedding size of 3072, 32 attention heads, an MLP intermediate dimension of 9216, and 32 layers in total. It also employs Grouped - Query Attention (GQA) and Rotary Position Embeddings (RoPE).
Commercial - Ready: The model is ready for commercial use.

📦 Installation

Support for this model will be added in the upcoming transformers release. In the meantime, you can install the library from source:

pip install git+https://github.com/huggingface/transformers

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

# Load the tokenizer and model
model_path = "nvidia/Llama-3.1-Minitron-4B-Width-Base"
tokenizer = AutoTokenizer.from_pretrained(model_path)

device = 'cuda'
dtype = torch.bfloat16
model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=dtype, device_map=device)

# Prepare the input text
prompt = 'Complete the paragraph: our solar system is'
inputs = tokenizer.encode(prompt, return_tensors='pt').to(model.device)

# Generate the output
outputs = model.generate(inputs, max_length=20)

# Decode and print the output
output_text = tokenizer.decode(outputs[0])
print(output_text)

📚 Documentation

Model Overview

Llama-3.1-Minitron-4B-Width-Base is a base text - to - text model suitable for various natural language generation tasks. It is derived from pruning Llama - 3.1 - 8B, followed by continued training with distillation using 94 billion tokens. This model is ready for commercial use.

Model Architecture

Property	Details
Architecture Type	Transformer Decoder (Auto - Regressive Language Model)
Network Architecture	Llama - 3.1
Input Type(s)	Text
Input Format(s)	String
Input Parameters	None
Other Properties Related to Input	Works well within 8k characters or less.
Output Type(s)	Text
Output Format	String
Output Parameters	1D
Other Properties Related to Output	None

Software Integration

Runtime Engine(s):

NeMo 24.05

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper
NVIDIA Lovelace

[Preferred/Supported] Operating System(s):

Linux

Dataset & Training

Property	Details
Data Collection Method by Dataset	Automated
Labeling Method by Dataset	Not Applicable
Properties	The training corpus consists of English and multilingual text, as well as code. Sources cover various document types like webpages, dialogue, articles, etc. The corpus spans domains including legal, math, science, finance, etc. A small portion of question - answering and alignment style data is added in the continued training set.
Data Freshness	The pretraining data has a cutoff of June 2023.

Evaluation Results

Overview

5 - shot performance. Language Understanding evaluated using Massive Multitask Language Understanding:

Average
60.5

Zero - shot performance. Evaluated using select datasets from the [LM Evaluation Harness](https://github.com/EleutherAI/lm - evaluation - harness) with additions:

HellaSwag	Winogrande	GSM8K	ARC - Challenge	XLSum
76.1	73.5	41.2	55.6	28.7

Code generation performance. Evaluated using [MBPP](https://github.com/google - research/google - research/tree/master/mbpp):

Score
32.0

Inference

Property	Details
Engine	TensorRT - LLM
Test Hardware	NVIDIA A100
DType	BFloat16

Limitations

The model was trained on data containing toxic language, unsafe content, and societal biases originally crawled from the internet. Thus, it may amplify those biases and return toxic responses, especially when prompted with toxic prompts. It may also generate inaccurate answers, omit key information, or include irrelevant or redundant text, producing socially unacceptable or undesirable text even if the prompt is not offensive.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and has established policies and practices for AI development. When developers download or use this model according to the terms of service, they should work with their internal model team to ensure it meets industry requirements and addresses potential product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en - us/support/submit - security - vulnerability/).

References

📄 License

This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia - open - model - license - agreement - june - 2024.pdf).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご