Mitsua Likes Open-Source Text-to-Image Model - Free Bilingual Japanese and English Art Image Creation

Mitsua Likes

Developed by Mitsua

Mitsua Likes is a Japanese/English text-to-image latent diffusion model developed based on the concept of collaborative art creation, trained exclusively with explicitly authorized and licensed data, and certified for fair training.

Image Generation Supports Multiple LanguagesOpen Source License:Other #Compliant Authorized Training #Anime Style Generation #Dual Japanese-English Support

Downloads 29

Release Time : 12/16/2024

Model Overview

This model specializes in generating anime-style simple portraits and landscapes, utilizing a from-scratch training architecture without relying on any pre-trained model knowledge.

Model Features

Fair Training Certification

Certified by Fairly Trained, ensuring training exclusively with authorized data

From-Scratch Training Architecture

The entire model architecture is trained from scratch, independent of any pre-trained model knowledge

Anime Style Specialization

Excels in generating anime-style simple portraits and landscapes

Multilingual Support

Supports Japanese and English prompt inputs

Built-in character similarity detection model and invisible watermarking technology

Model Capabilities

Text-to-Image Generation

Anime-Style Image Generation

Multilingual Prompt Understanding

Simple Portrait Creation

Landscape Generation

Use Cases

Artistic Creation

Anime Character Creation

Generate simple anime-style character portraits

Suitable for personal non-commercial character design

Landscape Generation

Generate anime-style landscapes based on text descriptions

Useful for artistic inspiration

Research & Education

Generative Model Research

Study the performance of models trained on small-scale authorized data

🚀 Mitsua Likes : A Text-to-Image Diffusion Model trained on Opt-In Contributors' "Likes"

Mitsua Likes is a text-to-image latent diffusion model that supports both Japanese and English. It is trained only on data with explicit opt - in permission, open - licensed data, and public domain data, ensuring compliance with licensing and ethical standards.

🚀 Quick Start

Installation

Install the python packages

pip install transformers sentencepiece diffusers

Verified on the following versions:

transformers==4.44.2
diffusers==0.31.0
sentencepiece==0.2.0

from diffusers import DiffusionPipeline
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16

pipe = DiffusionPipeline.from_pretrained("Mitsua/mitsua-likes", trust_remote_code=True).to(device, dtype=dtype)

# Japanese or English prompt
prompt = "滝の中の絵藍ミツア、先生アート"
# prompt = "elanmitsua in waterfall, sensei art, analog, impressionism painting"
negative_prompt = "elan doodle, lowres"

ret = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=5.0,
    guidance_rescale=0.7,
    width=768,
    height=768,
    num_inference_steps=40,
)

# Please check similarity detection model output
print("Similarity Restriction:", ret.detected_public_fictional_characters[0])
print("Similarity Measure:")
for k, v in ret.detected_public_fictional_characters_info[0].items():
    print(f"{k} : {v:.3%}")

image = ret.images[0]

✨ Features

Ethical Training: Mitsua Likes is trained solely on opt - in/openly licensed data and public domain data, without using data generated by other AI models. It is Fairly Trained certified, indicating it does not train on copyrighted works without a license.
Independent Architecture: The entire architecture of the model (CLIP Text Encoder, VAE, UNet) is trained from scratch without using the knowledge of pre - trained models.
Specific Domain Generation: It struggles with most modern concepts and complex prompts but excels in generating specific types of images, such as simple anime - style portraits and landscapes.

📦 Installation

Install the required Python packages:

pip install transformers sentencepiece diffusers

Make sure to use the following verified versions:

transformers==4.44.2
diffusers==0.31.0
sentencepiece==0.2.0

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16

pipe = DiffusionPipeline.from_pretrained("Mitsua/mitsua-likes", trust_remote_code=True).to(device, dtype=dtype)

# Japanese or English prompt
prompt = "滝の中の絵藍ミツア、先生アート"
# prompt = "elanmitsua in waterfall, sensei art, analog, impressionism painting"
negative_prompt = "elan doodle, lowres"

ret = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=5.0,
    guidance_rescale=0.7,
    width=768,
    height=768,
    num_inference_steps=40,
)

# Please check similarity detection model output
print("Similarity Restriction:", ret.detected_public_fictional_characters[0])
print("Similarity Measure:")
for k, v in ret.detected_public_fictional_characters_info[0].items():
    print(f"{k} : {v:.3%}")

image = ret.images[0]

📚 Documentation

Model Details

Property	Details
Developed by	ELAN MITSUA Project / Abstract Engine
Model Type	Text - to - Image Latent Diffusion Model
Language(s)	Japanese and English
License	Mitsua Likes Attribution - NonCommercial License (in English) / Mitsua Likes 表示 - 非営利ライセンス (in Japanese). "Mitsua Likes" attribution required for sharing generated result. Commercial use is restricted to personal own creative purpose. Using this model for machine learning is prohibited. For corporate commercial use, please contact at this contact form.

Model Architecture

CLIP Text Encoder

12 - Layer masked text transformer
Tokenizer: sentencepiece tokenizer with 64k vocabs
Max length: 64 tokens
This text encoder is from Mitsua Japanese CLIP

VAE

The VAE is trained with fully formula - based Wavelet Loss, ensuring not depending on ImageNet in any kind.
The VAE decoder is finetuned to embed an invisible watermark in the image, based on our own implementation referring to The Stable Signature.
Num latent channels: 8
Note: This repo's VAE encoder weight is initialized to prevent misuse of unauthorized finetuning. If you need VAE encoder weight, please apply from My Mitsua Likes Waitlist Registration.
Total training steps: 280k steps w/ batch size 240, resolution 256x256, took about 800 RTX4090 hours.

UNet

The UNet architecture references SDXL's UNet but reduces the number of parameters to fit the relatively small training data size.
Training procedure: progressive resolution training and aspect bucket training.
To speed up training, Min - SNR formulation and Immiscible Diffusion technique are applied.
Total training steps: 550k w/ batch size 216 ~ 1920 depending on resolution.
All training done on single 8xH100 node and total UNet training took about 2,000 H100 GPU hours.

Character Similarity Determination Model

This model is a Swin Transformer multi - label classification model finetuned from Swin Base Multi Fractal 1k.
Training data is a subset of Mitsua Japanese CLIP model. It is an additional post - processing classification model for checking the similarity of generated images to licensed fictional characters.

Intended Use

Generation of artworks for further creative endeavors
Research or education on generative models

Out - of - Scope Use

Infringing others' rights in any kind (copyright, publicity right, privacy etc) or causing harm to others is a misuse of this model. This includes, but is not limited to:

Discriminating against, defaming, or insulting others, thereby damaging their honor or credibility.
Infringing or potentially infringing the intellectual property rights or privacy of others.
Disseminating information or content that unjustly harms the interests of others.
Disseminating false information or content.

Please read Mitsua Likes BY - NC "Prohibitions" for more details.

Trainable model waitlist

The encoder weight of the VAE is initialized in this repository for preventing misuse. Therefore, finetuning using image or image2image is technically disabled, as well as prohibited by the terms. For non - commercial research or personal creative purposes, you can register for the waitlist to receive full model access with the encoder weight of the VAE. The training data needs to be owned by your own or explicitly licensed. The training data summary will be disclosed to public. The other conditions are described in the following Google Form.

My Mitsua Likes Waitlist Registration

🔧 Technical Details

Training Data: The model is trained on datasets such as Mitsua/vrm - color - concept - 550k, Mitsua/art - museums - pd - 440k, and Mitsua/safe - commons - pd - 3m.
Optimization Techniques: Various techniques are used to speed up the training process, such as Min - SNR formulation, Immiscible Diffusion technique, and splitting the UNet and VAE Encoder processing into separate GPUs.

📄 License

The model is licensed under the Mitsua Likes Attribution - NonCommercial License. "Mitsua Likes" attribution is required for sharing generated results. Commercial use is restricted to personal own creative purposes, and using this model for machine learning is prohibited. For corporate commercial use, please contact at this contact form.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご