M

Minitron 8B Base

Developed by nvidia
Minitron-8B-Base is a large language model obtained by pruning Nemotron-4 15B, employing distillation and continuous training methods, saving 40 times the training tokens and 1.8 times the computational cost compared to training from scratch.
Downloads 5,725
Release Time : 7/19/2024

Model Overview

Minitron-8B-Base is an efficient large language model derived from the Nemotron-4 15B model through pruning and distillation techniques, primarily used for text generation tasks.

Model Features

Efficient Training
Saves 40 times the training tokens and 1.8 times the computational cost compared to training from scratch.
High Performance
Demonstrates up to a 16% improvement in MMLU scores, with performance comparable to community models like Mistral 7B, Gemma 7B, and Llama-3 8B.
Advanced Architecture
Incorporates advanced techniques such as Grouped Query Attention (GQA) and Rotary Position Embedding (RoPE).

Model Capabilities

Text Generation
Language Understanding
Code Generation

Use Cases

Natural Language Processing
Text Completion
Generates coherent subsequent text based on given prompts.
Produces fluent and semantically coherent text.
Question Answering
Answers user-provided questions.
Provides accurate and relevant answers.
Code Generation
Code Completion
Generates subsequent code based on given code snippets.
Produces functionally correct code snippets.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase