Sana_600M_1024px Open-source Text-to-Image Framework - Rapidly Synthesize High-quality 4096×4096 Images

Sana 600M 1024px

Developed by Efficient-Large-Model

Sana is an efficient text-to-image framework capable of generating images with resolutions up to 4096×4096, featuring rapid synthesis of high-resolution, high-quality images.

Text-to-Image Supports Multiple Languages#High-resolution image generation #Fast text-to-image conversion #4096px ultra-clear output

Downloads 285

Release Time : 11/30/2024

Model Overview

Sana is a text-to-image generation model based on linear diffusion transformers, utilizing Gemma2-2B-IT as the text encoder and DC-AE as the latent feature encoder, enabling efficient high-resolution image generation.

Model Features

High-resolution image generation

Capable of generating high-quality images with resolutions up to 4096×4096.

Efficient inference

Can be efficiently deployed and run even on laptop GPUs.

Strong text-image alignment

Generated images exhibit high consistency with input text.

Model Capabilities

Text-to-image generation

High-resolution image synthesis

Fast image generation

Use Cases

Artistic creation

Artwork generation

Used for generating artworks and assisting in the creative process of design.

Produces high-quality artistic images.

Education

Educational tool

Used for image generation in educational or creative tools.

Provides intuitive visual aids for teaching.

Research

Generative model research

Used to explore and understand the limitations and biases of generative models.

Advances the development of generative model technologies.

🚀 Sana Model

Sana is a text - to - image framework. It can efficiently generate high - resolution (up to 4096 × 4096) and high - quality images with strong text - image alignment at a fast speed, and can be deployed on laptop GPUs.

🚀 Quick Start

The source code of Sana is available at https://github.com/NVlabs/Sana. For research purposes, we recommend using the generative - models Github repository (https://github.com/NVlabs/Sana) for training and inference, which integrates advanced diffusion samplers like Flow - DPM - Solver. MIT Han - Lab provides free Sana inference.

✨ Features

High - resolution Image Generation: Can generate images up to 4096 × 4096 resolution.
Fast Generation Speed: Capable of generating high - quality images with strong text - image alignment at a remarkably fast speed, and can be deployed on laptop GPUs.

📚 Documentation

Model

teaser_page1

Model Description

Property	Details
Developed by	NVIDIA, Sana
Model Type	Linear - Diffusion - Transformer - based text - to - image generative model
Model Size	1648M parameters
Model Resolution	This model is developed to generate 1024px based images with multi - scale height and width.
License	NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use
Model Description	This is a model that can be used to generate and modify images based on text prompts. It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoders ([Gemma2 - 2B - IT](https://huggingface.co/google/gemma - 2 - 2b - it)) and one 32x spatial - compressed latent feature encoder ([DC - AE](https://hanlab.mit.edu/projects/dc - ae)).
Resources for more information	Check out our GitHub Repository and the Sana report on arXiv.

Model Sources

For research purposes, we recommend our generative - models Github repository (https://github.com/NVlabs/Sana), which is more suitable for both training and inference and for which most advanced diffusion sampler like Flow - DPM - Solver is integrated. MIT Han - Lab provides free Sana inference.

Repository: https://github.com/NVlabs/Sana
Demo: https://nv-sana.mit.edu/

🧨 Diffusers

PR developing: Sana and DC - AE

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Excluded uses are described below.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model cannot render complex legible text.
Fingers, etc. in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご