Open source NVIDIA Nemotron-H-8B-Base-8K model, multi-language text completion, supporting a context length of 8K!

Nemotron H 8B Base 8K

Developed by nvidia

The NVIDIA Nemotron-H-8B-Base-8K is a large language model (LLM) developed by NVIDIA, designed to generate completions for given text fragments. The model adopts a hybrid architecture primarily composed of Mamba-2 and MLP layers, incorporating only four attention layers. It supports a context length of 8K and covers multiple languages including English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Hybrid Mamba Architecture #8K Long-Context Processing #Multilingual Generation

Downloads 5,437

Release Time : 3/19/2025

Model Overview

This model is a foundational language model primarily intended for text generation tasks and supports multiple languages. Users are recommended to fine-tune the model using the customization tools provided by the NeMo Framework to achieve optimal performance on specific tasks.

Model Features

Hybrid Architecture

Combines Mamba-2 and MLP layers with only four attention layers for efficient performance.

Multilingual Support

Supports multiple languages including English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.

Long-Context Support

Supports an 8K context length, making it suitable for long-text tasks.

Efficient Inference

Optimized for NVIDIA GPU-accelerated systems, enabling faster training and inference speeds.

Model Capabilities

Text Generation

Multilingual Text Completion

Code Generation

Mathematical Problem Solving

Common-Sense Reasoning

Use Cases

Research & Development

Language Model Research

Used for developing and testing new methods and techniques for large language models.

Multilingual Application Development

Developing multilingual text generation and completion applications.

Education

Mathematical Problem Solving

Used to solve elementary to advanced mathematical problems, aiding learning.

Achieved an accuracy of 87.11 on the GSM8K dataset.

Programming Assistance

Code Generation

Generates solutions for Python programming tasks.

Achieved an accuracy of 65.37 on the MBPP dataset.

🚀 Nemotron-H-8B-Base-8K

NVIDIA Nemotron-H-8B-Base-8K is a large language model designed for text completion. It uses a hybrid architecture and supports multiple languages with a context length of 8K.

🚀 Quick Start

To use the Nemotron-H-8B-Base-8K model, you can follow this simple Python example:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the tokenizer and model
tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Base-8K", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Base-8K", torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()

prompt = "When was NVIDIA founded?"

outputs = model.generate(**tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device))
print(tokenizer.decode(outputs[0]))

✨ Features

Hybrid Architecture: Utilizes a hybrid model architecture mainly composed of Mamba-2 and MLP layers, combined with just four Attention layers.
Multi - Language Support: Supports multiple languages including English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
8K Context Length: Capable of handling text sequences with a context length of up to 8K.
Customization: Can be customized using the NeMo Framework suite of tools, such as Parameter - Efficient Fine - Tuning and Model Alignment.

📚 Documentation

Model Overview

NVIDIA Nemotron-H-8B-Base-8K is a large language model (LLM) developed by NVIDIA for text completion. For more detailed information on the model architecture, training, and evaluation, please see the project page and the technical report.

License/Terms of Use

Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License.

Use Case

This model is intended for developers and researchers building LLMs.

Release Date

4/14/2025

References

[2504.03624] Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Model Architecture

Property	Details
Model Type	Hybrid Mamba-Transformer
Network Architecture	Nemotron-H
Model Parameters	8B

Input

Property	Details
Input Type(s)	Text
Input Format(s)	String
Input Parameters	One - Dimensional (1D): Sequences
Other Properties	Context length up to 8K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English.

Output

Property	Details
Output Type(s)	Text
Output Format	String
Output Parameters	One - Dimensional (1D): Sequences

Software Integration

Property	Details
Runtime Engine(s)	NeMo 24.12
Supported Hardware	NVIDIA H100 - 80GB, NVIDIA A100
Operating System(s)	Linux

Model Version

v1.0

Prompt Format

As this is a base model, no explicit prompt format is recommended or required.

Training, Testing, and Evaluation Datasets

Training & Testing Datasets

The training corpus consists of English and multilingual text, as well as code. It covers various document types and domains. Data collection and labeling are hybrid (Automated, Human, Synthetic).

Evaluation Datasets

We used multiple datasets to evaluate the model, including those for commonsense understanding, coding, math, and general knowledge.

Commonsense Understanding Evaluations

ARC Challenge 25 - shot	Hellaswag 10 - shot	Winogrande 5 - shot	CommonsenseQA 7 - shot
88.74	83.23	80.51	78.71

ARC (Ai2 reasoning challenge) - Challenge: Dataset
Hellaswag: Dataset
Winogrande: Dataset
CommonsenseQA: Dataset

Coding Evaluations

MBPP (sanitized) 3 - shot	MBPP+ 0 - shot	HumanEval 0 - shot	HumanEval+ 0 - shot
65.37	59.52	58.54	55.49

MBPP (Mostly Basic Python Programming Problems): Dataset
MBPP+: Dataset
HumanEval: Dataset

Math Evaluations

GSM8K 8 - shot CoT	MATH 4 - shot CoT	MATH - Lvl 5 4 - shot CoT	MATH - 500 4 - shot CoT
87.11	46.52	22.93	44.43

GSM8K (Grade School Math 8K): Dataset
MATH: Dataset
MATH Lvl 5: Dataset
MATH - 500: Dataset

General Evaluations

MMLU - Pro 5 - shot CoT	MMLU 5 - shot
44.01	72.77

MMLU Pro: Dataset
MMLU: Dataset

Potential Known Risks for Usage

The model was trained on data with toxic language and societal biases. It may amplify these biases and return toxic responses, especially with toxic prompts. It may also generate inaccurate, incomplete, or irrelevant text.

Inference

Property	Details
Engine	NeMo
Test Hardware	NVIDIA H100 - 80GB

Ethical Considerations

NVIDIA believes in Trustworthy AI. For more detailed information on ethical considerations, please see the Responsible Use Guide at http://nvidia.com/nemotron - responsible - use. Report security vulnerabilities or NVIDIA AI Concerns here.

📄 License

This model is licensed under the NVIDIA Internal Scientific Research and Development Model License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご