SANA1.5_4.8B_1024px Open-Source Image Generation Model - Free Deployment, Supports Creating High-Resolution Images up to 1024px

SANA1.5 4.8B 1024px

Developed by Efficient-Large-Model

SANA-1.5 is an efficient text-to-image generation model based on the Linear Diffusion Transformer architecture, supporting 1024px high-resolution image generation.

Text-to-Image Supports Multiple Languages#Efficient Model Scaling #1024px HD Image Generation #Linear Diffusion Transformer

Downloads 268

Release Time : 3/16/2025

Model Overview

SANA-1.5 is an efficient text-to-image model combining training-time and inference-time scaling techniques, featuring 4.8B parameters and supporting multi-scale aspect ratio image generation.

Model Features

Efficient Model Scaling

Scaling from 1.6B to 4.8B parameters with performance matching or surpassing full training, saving 60% training cost

Deep Pruning Support

Supports model size reduction to arbitrary dimensions

Inference Scaling Technique

Small model + inference scaling > large model

High-resolution Generation

Supports multi-scale aspect ratio image generation based on 1024px

Model Capabilities

Text-to-Image Generation

High-resolution Image Generation

Multi-scale Image Generation

Use Cases

Artistic Creation

Art Creation Assistance

Generate artworks based on text prompts

Produces images with artistic styles

Educational Tools

Creative Educational Tools

Develop creative tools for education

Helps students visualize learning content

Research

Generative Model Research

Study performance and limitations of generative models

Advances generative model technology

🚀 🐱 Sana Model Card

This is a scalable Linear - Diffusion - Transformer - based text - to - image generative model, which can generate and modify images based on text prompts. It uses efficient techniques to achieve better performance while saving training costs.

✨ Features

We introduce SANA - 1.5, an efficient model with scaling of training - time and inference time techniques. SANA - 1.5 delivers:

Efficient model growth: From 1.6B Sana - 1.0 model to 4.8B, achieving similar or better performance than training from scratch and saving 60% training cost.
Efficient model depth pruning: Slimming any model size as you want.
Powerful VLM selection based inference scaling: Smaller model + inference scaling > larger model.
Top - notch GenEval & DPGBench results.

teaser_page1

model growth performance on GenEval 8 - bit optimizer

Source code is available at https://github.com/NVlabs/Sana.

📚 Documentation

Model Description

Property	Details
Developed by	NVIDIA, Sana
Model type	Scalable Linear - Diffusion - Transformer - based text - to - image generative model
Model size	4.8B parameters
Model precision	torch.bfloat16 (BF16)
Model resolution	This model is developed to generate 1024px based images with multi - scale height and width.
License	NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use
Model Description	This is a model that can be used to generate and modify images based on text prompts. It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2 - 2B - IT](https://huggingface.co/google/gemma - 2 - 2b - it)) and one 32x spatial - compressed latent feature encoder ([DC - AE](https://hanlab.mit.edu/projects/dc - ae)).
Resources for more information	Check out our GitHub Repository and the SANA - 1.5 report on arXiv.

Model Sources

For research purposes, we recommend our generative - models Github repository (https://github.com/NVlabs/Sana), which is more suitable for both training and inference and for which most advanced diffusion sampler like Flow - DPM - Solver is integrated. [MIT Han - Lab](https://nv - sana.mit.edu/) provides free Sana inference.

Repository: https://github.com/NVlabs/Sana
Demo: https://nv - sana.mit.edu/

🧨 Diffusers

Developing

📄 Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

🔧 Technical Details

Limitations

The model does not achieve perfect photorealism.
The model cannot render complex legible text.
Fingers, etc. in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご