Sana_1600M_1024px_MultiLing Open - Source Image Generation Framework - Supports multiple languages and can generate high

Sana 1600M 1024px MultiLing

Developed by Efficient-Large-Model

Sana is an efficient text-to-image framework capable of generating images with resolutions up to 4096×4096, supporting multilingual input.

Text-to-Image Supports Multiple Languages#High-resolution image generation #Multilingual prompt support #Fast inference

Downloads 111

Release Time : 11/29/2024

Model Overview

Sana is a text-to-image model based on Linear Diffusion Transformer, capable of rapidly synthesizing high-resolution, high-quality, and well text-image-aligned images, supporting prompts in English, Chinese, and emoji combinations.

Model Features

High-resolution image generation

Supports generating high-quality images with resolutions up to 4096×4096

Multilingual support

Supports prompts in English, Chinese, and emoji combinations

Efficient inference

Can be deployed on laptop GPUs for fast image generation

Good text-image alignment

Generated images closely match the input text prompts

Model Capabilities

Text-to-image generation

High-resolution image synthesis

Multilingual prompt support

Fast inference

Use Cases

Creative arts

Artwork generation

Used for generating artworks and creative designs

High-quality artistic images

Design assistance

Quickly generates visual concepts during the design process

Accelerates the design process

Education

Educational tool development

Develops educational or creative tools

Enhances learning experience

Research

Generative model research

Used to explore and understand the limitations and biases of generative models

Advances model technology

🚀 Sana

Sana is a text - to - image framework that can efficiently generate high - resolution images up to 4096 × 4096. It can synthesize high - quality images with strong text - image alignment at a fast speed and can be deployed on laptop GPUs.

🚀 Quick Start

The source code of Sana is available at GitHub. You can explore the model through the provided links above.

✨ Features

High - Resolution Image Generation: Capable of generating images up to 4096 × 4096 resolution.
Fast and High - Quality: Can synthesize high - quality images with strong text - image alignment at a remarkable speed, and is deployable on laptop GPUs.
Multi - language Support: Supports English, Chinese, and Emoji, as well as all mixed prompts.

📚 Documentation

Compare with base model

Property	Details
Model Type	Linear - Diffusion - Transformer - based text - to - image generative model
Model Size	1648M parameters
Model Resolution	Developed to generate 1024px based images with multi - scale height and width
License	NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: [Gemma Terms of Use
Model Description	A model that can be used to generate and modify images based on text prompts. It is a Linear Diffusion Transformer that uses one fixed, pretrained text encoder ([Gemma2 - 2B - IT](https://huggingface.co/google/gemma - 2 - 2b - it)) and one 32x spatial - compressed latent feature encoder ([DC - AE](https://hanlab.mit.edu/projects/dc - ae)).
Special	Fine - tuned from the base model [Efficient - Large - Model/Sana_1600M_1024px](https://huggingface.co/Efficient - Large - Model/Sana_1600M_1024px) and supports Emoji, Chinese, English, and all mixed prompts.
Resources for more information	Check out our GitHub Repository and the Sana report on arXiv.

Model	Language
Sana_1600M_1024px	English
Sana_1600M_1024px_MultiLing	English, Chinese, Emoji

Model	Sample - 1	Sample - 2	Sample - 3	Sample - 4
Sana_1600M_1024px
Sana_1600M_1024px_MultiLing
Prompt	🐯 穿着 👕 吹 🎷	猫 Wearing 🕶 flying on the 彩虹 with 🌹 in the ❄️	🦁 teaching 🐯 to catch 🦋	金色 🌅 下的长城, traditional Chinese style

Model Sources

For research purposes, we recommend our generative - models Github repository (https://github.com/NVlabs/Sana), which is more suitable for both training and inference and for which most advanced diffusion sampler like Flow - DPM - Solver is integrated. [MIT Han - Lab](https://nv - sana.mit.edu/) provides free Sana inference.

Repository: https://github.com/NVlabs/Sana
Demo: https://nv - sana.mit.edu/

🧨 Diffusers

PR developing: Sana and DC - AE

💡 Usage

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

🔧 Technical Details

Limitations

The model does not achieve perfect photorealism.
The model cannot render complex legible text.
Fingers, etc. in general may not be generated properly.
The autoencoding part of the model is lossy.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

📄 License

The model is under NSCL v2 - custom. Governing Terms: NVIDIA License. Additional Information: Gemma Terms of Use | Google AI for Developers for Gemma - 2 - 2B - IT, Gemma Prohibited Use Policy | Google AI for Developers.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご