🚀 DALL·E Mini Model Card
This model card focuses on the model associated with the DALL·E mini space on Hugging Face. It can generate images from text prompts, offering a great tool for research and personal creativity. The app incorporates both “DALL·E Mini” and “DALL·E Mega” models.
✨ Features
- Text-to-Image Generation: Generate images based on text prompts.
- Multiple Model Versions: Includes “DALL·E Mini” and “DALL·E Mega” models.
📚 Documentation
Model Details
Property |
Details |
Developed by |
Boris Dayma, Suraj Patil, Pedro Cuenca, Khalid Saifullah, Tanishq Abraham, Phúc Lê, Luke, Luke Melas, Ritobrata Ghosh |
Model Type |
Transformer-based text-to-image generation model |
Language(s) |
English |
License |
Apache 2.0 |
Model Description |
This model can generate images from text prompts. As the developers said, “OpenAI had the first impressive model for generating images with DALL·E. DALL·E mini is an attempt at reproducing those results with an open-source model.” |
Resources for more information |
See OpenAI’s website for DALL·E, the project report from the model’s developers, and the DALL·E Mega training journal. |
Cite as |
|
@misc{Dayma_DALL·E_Mini_2021,
author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and Lê Khắc, Phúc and Melas, Luke and Ghosh, Ritobrata},
doi = {10.5281/zenodo.5146400},
month = {7},
title = {DALL·E Mini},
url = {https://github.com/borisdayma/dalle-mini},
year = {2021}
}
Uses
Direct Use
The model is for generating images from text prompts for research and personal use, such as supporting creativity, creating humorous content, etc. It excludes uses described in the Misuse and Out-of-Scope Use section.
Downstream Use
- Research: Probe and understand the limitations and biases of generative models.
- Tool Development: Develop educational or creative tools.
- Artwork Generation: Generate artwork for design and artistic processes.
- Other Uses: Such as poetry illustration, fan art, etc.
Downstream uses also exclude those in Misuse and Out-of-Scope Use.
Misuse, Malicious Use, and Out-of-Scope Use
The model should not be used to create or disseminate harmful images, including disturbing, offensive, or stereotype - propagating content.
Out-of-Scope Use
The model was not trained for factual representations, so using it for such content is out - of - scope.
Misuse and Malicious Use
- Harmful Representations: Generate demeaning or harmful representations of people.
- Discriminatory Content: Promote or propagate discriminatory content or stereotypes.
- Impersonation: Impersonate individuals without consent.
- Inappropriate Content: Generate sexual content without consent, mis - and disinformation, etc.
Limitations and Bias
Limitations
- Poor Image Quality: Faces and animals are not well - generated.
- Unpredictable Performance: Hard to predict where the model excels or fails.
- Language Limitation: Only trained with English descriptions, performs poorly in other languages.
Bias
The model was trained on unfiltered English - described Internet data, which may reinforce or exacerbate societal biases.
- Stereotype Inclusion: Generated images may contain negative stereotypes.
- Underrepresentation: People of color are underrepresented.
- Power Differentials: Biased content may depict power differentials between white and non - white people.
- Language Accessibility: Limited usability for non - English speakers.
Limitations and Bias Recommendations
- User Awareness: Make users aware of biases and limitations.
- Content Filtering: Filter out potentially problematic content.
- Dataset Improvement: Curate a diverse and inclusive training dataset.
Training
Training Data
The model developers used three datasets:
Training Procedure
During training:
- Image Encoding: Images are encoded by a VQGAN encoder.
- Description Encoding: Descriptions are encoded by a BART encoder.
- Decoding: The BART decoder predicts the next token.
- Loss Calculation: Loss is the [softmax cross - entropy](https://wandb.ai/sauravm/Activation - Functions/reports/Activation - Functions - Softmax--VmlldzoxNDU1Njgy#%F0%9F%93%A2 - softmax - + - cross - entropy - loss - (caution: - math - alert)) between model predictions and actual image encodings.
For DALL·E Mega:
- Hardware: 1 pod TPU v3 - 256.
- Optimizer: Distributed Shampoo.
- Model Partition: 8 model parallel x 32 data parallel.
- Batch: 4224 samples per update.
- Learning Rate: Warmup to 0.0001 for 10,000 steps and then constant.
Evaluation Results
The model developers compare DALL·E Mini’s results with [DALL·E - pytorch](https://github.com/lucidrains/DALLE - pytorch), OpenAI’s [DALL·E](https://openai.com/blog/dall - e/), and other models in their [technical report](https://wandb.ai/dalle - mini/dalle - mini/reports/DALL - E - Mini - Explained - with - Demo--Vmlldzo4NjIxODA#the - results - of - our - dall - e - experiment). For DALL·E Mega, see this [technical report](https://wandb.ai/dalle - mini/dalle - mini/reports/DALL - E - mini - Generate - images - from - any - text - prompt--VmlldzoyMDE4NDAy).
Environmental Impact
DALL·E Mini Estimated Emissions
Based on the Machine Learning Impact calculator, we estimate the CO2 emissions:
Property |
Details |
Hardware Type |
TPU v3 - 8 |
Hours used |
72 (3 days) |
Cloud Provider |
GCP |
Compute Region |
us - east1 |
Carbon Emitted |
30.16 kg CO2 eq. |
DALL·E Mega Estimated Emissions
DALL·E Mega is still training. As of June 9, 2022, the model developers' repo...
📄 License
This model is licensed under the Apache 2.0 license.