Sana_Sprint_1.6B_1024px Open-source Image Generation Model - Ultra-fast Inference, Top-notch Results with Fewer Steps

Sana Sprint 1.6B 1024px

Developed by Efficient-Large-Model

SANA-Sprint is an ultra-efficient text-to-image diffusion model that reduces inference steps from 20 to 1-4 while maintaining top-tier performance.

Image Generation Supports Multiple Languages#Single-step image generation #1024 HD resolution #Real-time controlled generation

Downloads 475

Release Time : 3/21/2025

Model Overview

A linear diffusion Transformer model for generating/editing images based on text prompts, featuring a fixed-parameter text encoder and a 32x spatially compressed latent feature encoder.

Model Features

Ultra-efficient inference

Reduces inference steps from 20 to 1-4, achieving 1024×1024 image latency of 0.1s (T2I) and 0.25s (ControlNet) on H100.

Continuous-time consistency distillation

A training-free sCM method for high-quality image generation.

Unified adaptive step size

Unified adaptive step size model enables high-quality generation in 1-4 steps.

Integrated ControlNet

Integrated ControlNet enables real-time interactive image generation.

Model Capabilities

Text-to-image

Image editing

Real-time interactive image generation

Use Cases

Art creation

Artwork generation

Generate artworks based on text prompts

High-quality artwork generation

Design creation

Assist designers in creative design

Rapid design concept generation

Education

Educational tool development

Develop creative tools for education

Enhanced teaching interactivity

Research

Generative model research

For generative model research and development

Advancing generative model technology

🚀 Sana, Sana-Sprint

Sana and Sana-Sprint are ultra - efficient text - to - image diffusion models, achieving state - of - the - art performance with significantly reduced inference steps.

🚀 Quick Start

This README provides a detailed introduction to the Sana and Sana - Sprint models, including their features, efficiency, model description, uses, limitations, and biases.

✨ Features

Ultra - efficient Diffusion: Reduces inference steps from 20 to 1 - 4 while achieving state - of - the - art performance.
Training - free Approach: Utilizes a training - free approach for continuous - time consistency distillation (sCM), eliminating costly retraining.
Unified Step - adaptive Model: A unified step - adaptive model for high - quality generation in 1 - 4 steps.
ControlNet Integration: Enables real - time interactive image generation through ControlNet integration.

📦 Installation

The source code is available at GitHub Repository. You can refer to the repository for installation and usage instructions.

💻 Usage Examples

No specific code examples are provided in the original README. If you want to use the model, you can refer to the source code on the GitHub Repository and the SANA - Sprint report on arXiv.

📚 Documentation

Model Description

Property	Details
Developed by	NVIDIA, Sana
Model Type	One - Step Diffusion with Continuous - Time Consistency Distillation
Model Size	1.6B parameters
Model Precision	torch.bfloat16 (BF16)
Model Resolution	This model is developed to generate 1024px based images with multi - scale height and width.
License	NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use
Model Description	This is a model that can be used to generate and modify images based on text prompts. It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2 - 2B - IT](https://huggingface.co/google/gemma - 2 - 2b - it)) and one 32x spatial - compressed latent feature encoder ([DC - AE](https://hanlab.mit.edu/projects/dc - ae)).
Resources for more information	Check out our GitHub Repository and the SANA - Sprint report on arXiv.

Model Sources

Repository: https://github.com/NVlabs/Sana
Demo: https://nv - sana.mit.edu/sprint
Guidance: https://github.com/NVlabs/Sana/asset/docs/sana_sprint.md

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model cannot render complex legible text.
Fingers, etc. in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

Visuals

logo

teaser_page1

Model Efficiency

SANA - Sprint is an ultra - efficient diffusion model for text - to - image (T2I) generation, reducing inference steps from 20 to 1 - 4 while achieving state - of - the - art performance. Key innovations include: (1) A training - free approach for continuous - time consistency distillation (sCM), eliminating costly retraining; (2) A unified step - adaptive model for high - quality generation in 1 - 4 steps; and (3) ControlNet integration for real - time interactive image generation. SANA - Sprint achieves 7.59 FID and 0.74 GenEval in just 1 step — outperforming FLUX - schnell (7.94 FID / 0.71 GenEval) while being 10× faster (0.1s vs 1.1s on H100). With latencies of 0.1s (T2I) and 0.25s (ControlNet) for 1024×1024 images on H100, and 0.31s (T2I) on an RTX 4090, SANA - Sprint is ideal for AI - powered consumer applications (AIPC).

License

The model is licensed under NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: Gemma Terms of Use | Google AI for Developers for Gemma - 2 - 2B - IT, Gemma Prohibited Use Policy | Google AI for Developers.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご