Llama-3.1-Nemotron-Nano-8B-v1-GGUF Open-Source Large Language Model - 8B Parameter Multi-Quantized Version Available for Free

Llama 3.1 Nemotron Nano 8B V1 GGUF

Developed by tensorblock

An 8B-parameter open-source large language model released by NVIDIA, based on the Llama-3 architecture, offering multiple quantization versions

Large Language Model EnglishOpen Source License:Other #Lightweight 8B Model #Multi-turn Dialogue Optimization #GGUF Efficient Inference

Downloads 1,048

Release Time : 3/18/2025

Model Overview

This is an 8B-parameter large language model based on the Llama-3 architecture, released by NVIDIA. The model provides multiple quantization versions suitable for different hardware environments.

Model Features

Multiple Quantization Versions

Offers 12 quantization versions from Q2_K to Q8_0 to meet different hardware environments and performance needs

Efficient Inference

Optimized GGUF format suitable for running on resource-limited devices

Llama-3 Architecture

Based on the latest Llama-3 architecture, providing excellent text generation capabilities

Model Capabilities

Text Generation

Dialogue Systems

Content Creation

Use Cases

Dialogue Systems

Intelligent Assistant

Can be used to build English intelligent dialogue assistants

Content Generation

Article Writing

Can assist in English article writing

base_model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1 language:

en library_name: transformers license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ pipeline_tag: text-generation tags:
nvidia
llama-3
pytorch
TensorBlock
GGUF

Feedback and support: TensorBlock's Twitter/X, Telegram Group and Discord server

nvidia/Llama-3.1-Nemotron-Nano-8B-v1 - GGUF

This repo contains GGUF format model files for nvidia/Llama-3.1-Nemotron-Nano-8B-v1.

The files were quantized using machines provided by TensorBlock, and they are compatible with llama.cpp as of commit b4882.

Our projects

Awesome MCP Servers	TensorBlock Studio

A comprehensive collection of Model Context Protocol (MCP) servers.	A lightweight, open, and extensible multi-LLM interaction studio.
👀 See what we built 👀	👀 See what we built 👀

## Prompt template

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Model file specification

Filename	Quant type	File Size	Description
Llama-3.1-Nemotron-Nano-8B-v1-Q2_K.gguf	Q2_K	3.179 GB	smallest, significant quality loss - not recommended for most purposes
Llama-3.1-Nemotron-Nano-8B-v1-Q3_K_S.gguf	Q3_K_S	3.665 GB	very small, high quality loss
Llama-3.1-Nemotron-Nano-8B-v1-Q3_K_M.gguf	Q3_K_M	4.019 GB	very small, high quality loss
Llama-3.1-Nemotron-Nano-8B-v1-Q3_K_L.gguf	Q3_K_L	4.322 GB	small, substantial quality loss
Llama-3.1-Nemotron-Nano-8B-v1-Q4_0.gguf	Q4_0	4.661 GB	legacy; small, very high quality loss - prefer using Q3_K_M
Llama-3.1-Nemotron-Nano-8B-v1-Q4_K_S.gguf	Q4_K_S	4.693 GB	small, greater quality loss
Llama-3.1-Nemotron-Nano-8B-v1-Q4_K_M.gguf	Q4_K_M	4.921 GB	medium, balanced quality - recommended
Llama-3.1-Nemotron-Nano-8B-v1-Q5_0.gguf	Q5_0	5.599 GB	legacy; medium, balanced quality - prefer using Q4_K_M
Llama-3.1-Nemotron-Nano-8B-v1-Q5_K_S.gguf	Q5_K_S	5.599 GB	large, low quality loss - recommended
Llama-3.1-Nemotron-Nano-8B-v1-Q5_K_M.gguf	Q5_K_M	5.733 GB	large, very low quality loss - recommended
Llama-3.1-Nemotron-Nano-8B-v1-Q6_K.gguf	Q6_K	6.596 GB	very large, extremely low quality loss
Llama-3.1-Nemotron-Nano-8B-v1-Q8_0.gguf	Q8_0	8.541 GB	very large, extremely low quality loss - not recommended

Downloading instruction

Command line

Firstly, install Huggingface Client

pip install -U "huggingface_hub[cli]"

Then, downoad the individual model file the a local directory

huggingface-cli download tensorblock/Llama-3.1-Nemotron-Nano-8B-v1-GGUF --include "Llama-3.1-Nemotron-Nano-8B-v1-Q2_K.gguf" --local-dir MY_LOCAL_DIR

If you wanna download multiple model files with a pattern (e.g., *Q4_K*gguf), you can try:

huggingface-cli download tensorblock/Llama-3.1-Nemotron-Nano-8B-v1-GGUF --local-dir MY_LOCAL_DIR --local-dir-use-symlinks False --include='*Q4_K*gguf'

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご