Sana_600M_512px Open-source Image Generation Framework - Free Deployment for Rapid Synthesis of High-resolution Images

Sana 600M 512px

Developed by Efficient-Large-Model

Sana is an efficient text-to-image framework capable of generating images with resolutions up to 4096×4096, featuring fast synthesis of high-resolution, high-quality images

Image Generation Supports Multiple Languages#Efficient High-Resolution Image Generation #Linear Diffusion Transformer #Multi-Scale Image Synthesis

Downloads 2,853

Release Time : 11/30/2024

Model Overview

A text-to-image model based on a linear diffusion transformer, using Gemma2-2B-IT as the text encoder and DC-AE as the latent feature encoder

Model Features

High-Resolution Image Generation

Supports generating high-quality images with resolutions up to 4096×4096

Efficient Inference

Can run efficiently on laptop GPUs with fast inference speed

Strong Text-Image Alignment

Generated images closely match the input text prompts

Multi-Scale Support

Supports multi-scale height and width image generation based on 512px

Model Capabilities

Text-to-Image Generation

High-Resolution Image Synthesis

Multilingual Support

Use Cases

Artistic Creation

Artwork Generation

Used for image generation in artistic creation and design processes

Generates high-quality artworks

Educational Tools

Creative Educational Tools

Used for image generation in education or creative tools

Assists in teaching and creative expression

Research

Generative Model Research

Used to explore and understand the limitations and biases of generative models

🚀 Sana Model

Sana is a text - to - image framework that can efficiently generate high - resolution (up to 4096 × 4096) and high - quality images with strong text - image alignment at a remarkably fast speed, and it can be deployed on a laptop GPU.

✨ Features

Model Description

Property	Details
Developed by	NVIDIA, Sana
Model Type	Linear - Diffusion - Transformer - based text - to - image generative model
Model Size	1648M parameters
Model Resolution	This model is developed to generate 512px based images with multi - scale height and width.
License	NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use
Model Description	This is a model that can be used to generate and modify images based on text prompts. It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2 - 2B - IT](https://huggingface.co/google/gemma - 2 - 2b - it)) and one 32x spatial - compressed latent feature encoder ([DC - AE](https://hanlab.mit.edu/projects/dc - ae)).
Resources for more information	Check out our GitHub Repository and the Sana report on arXiv.

Model Sources

For research purposes, we recommend our generative - models Github repository (https://github.com/NVlabs/Sana), which is more suitable for both training and inference and for which most advanced diffusion sampler like Flow - DPM - Solver is integrated. [MIT Han - Lab](https://nv - sana.mit.edu/) provides free Sana inference.

Repository: https://github.com/NVlabs/Sana
Demo: https://nv - sana.mit.edu/

🧨 Diffusers

PR developing: Sana and DC - AE

📚 Documentation

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model cannot render complex legible text.
Fingers, etc. in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご