đ Nemotron-H-8B-Base-8K
NVIDIA Nemotron-H-8B-Base-8K is a large language model designed for text completion. It uses a hybrid architecture and supports multiple languages with a context length of 8K.
đ Quick Start
To use the Nemotron-H-8B-Base-8K model, you can follow this simple Python example:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Base-8K", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Base-8K", torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
prompt = "When was NVIDIA founded?"
outputs = model.generate(**tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device))
print(tokenizer.decode(outputs[0]))
⨠Features
- Hybrid Architecture: Utilizes a hybrid model architecture mainly composed of Mamba-2 and MLP layers, combined with just four Attention layers.
- Multi - Language Support: Supports multiple languages including English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
- 8K Context Length: Capable of handling text sequences with a context length of up to 8K.
- Customization: Can be customized using the NeMo Framework suite of tools, such as Parameter - Efficient Fine - Tuning and Model Alignment.
đ Documentation
Model Overview
NVIDIA Nemotron-H-8B-Base-8K is a large language model (LLM) developed by NVIDIA for text completion. For more detailed information on the model architecture, training, and evaluation, please see the project page and the technical report.
License/Terms of Use
Use of this model is governed by the NVIDIA Internal Scientific Research and Development Model License.
Use Case
This model is intended for developers and researchers building LLMs.
Release Date
4/14/2025
References
Model Architecture
Property |
Details |
Model Type |
Hybrid Mamba-Transformer |
Network Architecture |
Nemotron-H |
Model Parameters |
8B |
Input
Property |
Details |
Input Type(s) |
Text |
Input Format(s) |
String |
Input Parameters |
One - Dimensional (1D): Sequences |
Other Properties |
Context length up to 8K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English. |
Output
Property |
Details |
Output Type(s) |
Text |
Output Format |
String |
Output Parameters |
One - Dimensional (1D): Sequences |
Software Integration
Property |
Details |
Runtime Engine(s) |
NeMo 24.12 |
Supported Hardware |
NVIDIA H100 - 80GB, NVIDIA A100 |
Operating System(s) |
Linux |
Model Version
Prompt Format
As this is a base model, no explicit prompt format is recommended or required.
Training, Testing, and Evaluation Datasets
Training & Testing Datasets
The training corpus consists of English and multilingual text, as well as code. It covers various document types and domains. Data collection and labeling are hybrid (Automated, Human, Synthetic).
Evaluation Datasets
We used multiple datasets to evaluate the model, including those for commonsense understanding, coding, math, and general knowledge.
Commonsense Understanding Evaluations
ARC Challenge 25 - shot |
Hellaswag 10 - shot |
Winogrande 5 - shot |
CommonsenseQA 7 - shot |
88.74 |
83.23 |
80.51 |
78.71 |
Coding Evaluations
MBPP (sanitized) 3 - shot |
MBPP+ 0 - shot |
HumanEval 0 - shot |
HumanEval+ 0 - shot |
65.37 |
59.52 |
58.54 |
55.49 |
Math Evaluations
GSM8K 8 - shot CoT |
MATH 4 - shot CoT |
MATH - Lvl 5 4 - shot CoT |
MATH - 500 4 - shot CoT |
87.11 |
46.52 |
22.93 |
44.43 |
General Evaluations
MMLU - Pro 5 - shot CoT |
MMLU 5 - shot |
44.01 |
72.77 |
Potential Known Risks for Usage
The model was trained on data with toxic language and societal biases. It may amplify these biases and return toxic responses, especially with toxic prompts. It may also generate inaccurate, incomplete, or irrelevant text.
Inference
Property |
Details |
Engine |
NeMo |
Test Hardware |
NVIDIA H100 - 80GB |
Ethical Considerations
NVIDIA believes in Trustworthy AI. For more detailed information on ethical considerations, please see the Responsible Use Guide at http://nvidia.com/nemotron - responsible - use. Report security vulnerabilities or NVIDIA AI Concerns here.
đ License
This model is licensed under the NVIDIA Internal Scientific Research and Development Model License.