模型概述
模型特點
模型能力
使用案例
🚀 利用 ImageNet 進行文本到圖像生成,我們能走多遠?
本項目聚焦文本到圖像生成,提出利用精心挑選的小數據集進行策略性數據增強,以提升模型性能和生成圖像質量的新方法。
🚀 快速開始
本倉庫包含論文 “How far can we go with ImageNet for Text-to-Image generation?” 的代碼和模型。核心思想是,文本到圖像生成模型通常依賴大量數據集,更注重數量而非質量。常見的解決辦法是收集海量數據。我們提出了一種新方法,通過對精心挑選的小數據集進行策略性數據增強,來提升這些模型的性能。我們的研究表明,該方法在多個基準測試中提高了生成圖像的質量。
論文鏈接:Arxiv GitHub 倉庫:https://github.com/lucasdegeorge/T2I-ImageNet 項目網站:https://lucasdegeorge.github.io/projects/t2i_imagenet/
📦 安裝指南
首先,使用 Python(至少 3.9 版本)創建一個虛擬環境,克隆倉庫,並運行以下命令:
pip install -e .
更多詳細信息請參考此處。
📚 詳細文檔
預訓練模型
CAD - I 模型
在本倉庫中,該模型使用文本增強和圖像增強進行訓練。僅使用文本增強訓練的模型請參考此處。 若要使用預訓練模型,請執行以下操作:
from pipe import T2IPipeline
pipe = T2IPipeline("Lucasdegeorge/CAD-I").to("cuda")
prompt = "An adorable otter, with its sleek, brown fur and bright, curious eyes, playfully interacts with a vibrant bunch of broccoli... "
image = pipe(prompt, cfg=15)
如果您只想下載模型,而不下載採樣管道,可以執行以下操作:
from pipe import CAD
model = CAD.from_pretrained("Lucasdegeorge/CAD-I")
DiT - I 模型
即將推出...
提示詞
我們的模型經過專門訓練,能夠處理非常長且詳細的提示詞。為了獲得最佳性能和結果,建議您使用詳細豐富的提示詞。簡短或模糊的提示詞可能無法充分發揮模型的能力。
示例提示詞:
A majestic elephant stands tall and proud in the heart of the African savannah, its wrinkled, gray skin glistening under the intense afternoon sun. The elephant's large, flapping ears and long, sweeping trunk create a sense of grace and power as it gently sways, surveying the vast, golden grasslands stretching out before it. In the distance, a herd of zebras grazes peacefully, their stripes blending with the tall, dry grass. The scene is completed by a lone acacia tree silhouetted against the setting sun, casting long, dramatic shadows across the landscape.
A classic film camera rests on a tripod, its worn leather strap and scratched metal body telling the story of countless adventures and captured moments. The camera is positioned in a scenic landscape, with rolling hills, a winding river, and a distant mountain range bathed in the soft, golden light of sunset. In the foreground, a wildflower meadow sways gently in the breeze, while the camera's lens captures the beauty and tranquility of the scene, preserving it for eternity.
A graceful flamingo stands elegantly in the shallow waters of a tranquil lagoon, its vibrant pink feathers reflecting beautifully in the still water. The flamingo's long, slender legs and curved neck create a picture of poise and balance as it dips its beak into the water, searching for food. Behind the flamingo, a lush mangrove forest stretches out, its dense foliage providing a rich habitat for various wildlife. The scene is completed by a clear blue sky and the gentle rustling of leaves in the breeze
A hearty, overstuffed sandwich sits on a wooden cutting board, its layers of fresh, crisp lettuce, juicy tomatoes, and thinly sliced deli meats peeking out from between two slices of golden-brown bread. The sandwich's tantalizing aroma fills the air, mingling with the scent of freshly baked bread and tangy mustard. In the background, a bustling deli comes to life, with shelves lined with jars of pickles, a gleaming meat slicer, and a chalkboard menu listing the day's specials. The scene is completed by the lively chatter of customers and the clinking of glasses.
A stunning oil painting of a majestic tiger hangs on the wall of a dimly-lit art gallery, its vibrant colors and intricate details drawing the viewer in. The tiger's powerful, muscular body is depicted in mid-stride, its stripes blending seamlessly with the lush jungle foliage surrounding it. The painting captures the tiger's intense, amber eyes and the subtle play of light and shadow on its fur, creating a sense of depth and movement. The background features a dense canopy of trees and a cascading waterfall, adding to the wild, untamed atmosphere of the scene.
A clever magpie perched on a rustic wooden fence post, its iridescent black and white feathers shimmering in the sunlight. The bird tilts its head, holding a shiny trinket in its beak, with a backdrop of a golden wheat field swaying gently in the breeze. A few more curios and found objects are scattered along the fence, hinting at the magpie's treasure trove hidden nearby. A clear blue sky with puffy white clouds completes the scenic countryside atmosphere.
A playful dolphin leaps gracefully out of the sparkling turquoise waters, its sleek, gray body arcing through the air before diving back into the waves with a splash. Nearby, a classic wooden sailboat glides smoothly across the ocean, its white sails billowing in the breeze. The dolphin swims alongside the boat, its joyful antics mirrored by the shimmering sunlight dancing on the water's surface. The scene is completed by a clear blue sky and the distant horizon, where the sea meets the sky
使用管道
T2IPipeline
類為從文本提示詞生成圖像提供了全面的接口。以下是使用它的詳細指南:
💻 基礎用法
from pipe import T2IPipeline
# 初始化管道
pipe = T2IPipeline("Lucasdegeorge/CAD-I").to("cuda")
# 從提示詞生成圖像
prompt = "An adorable otter, with its sleek, brown fur and bright, curious eyes, playfully interacts with a vibrant bunch of broccoli... "
image = pipe(prompt, cfg=15)
高級配置
管道可以使用多個自定義選項進行初始化:
pipe = T2IPipeline(
model_path="Lucasdegeorge/CAD-I",
sampler="ddim", # 選項: "ddim", "ddpm", "dpm", "dpm_2S", "dpm_2M"
scheduler="sigmoid", # 選項: "sigmoid", "cosine", "linear"
postprocessing="sd_1_5_vae",
scheduler_start=-3,
scheduler_end=3,
scheduler_tau=1.1,
device="cuda"
)
生成參數
管道的 __call__
方法接受各種參數來控制生成過程:
image = pipe(
cond="A beautiful landscape", # 文本提示詞或提示詞列表
num_samples=4, # 要生成的圖像數量
cfg=15, # 無分類器引導比例
guidance_type="constant", # 引導類型: "constant", "linear"
guidance_start_step=0, # 開始引導的步驟
coherence_value=1.0, # 採樣的一致性值
uncoherence_value=0.0, # 採樣的非一致性值
thresholding_type="clamp", # 閾值類型: "clamp", "dynamic_thresholding", "per_channel_dynamic_thresholding"
clamp_value=1.0, # 閾值的鉗位值
thresholding_percentile=0.995 # 閾值的百分位數
)
引導類型
constant
:在整個採樣過程中應用統一的引導linear
:引導強度從開始到結束線性增加exponential
:引導強度從開始到結束指數增加
閾值類型
clamp
:使用clamp_value
將值鉗位到固定範圍dynamic
:根據批次統計信息動態調整閾值percentile
:使用基於百分位數的閾值,閾值百分位數為thresholding_percentile
高級參數
為了更精細地控制生成過程,您還可以指定以下參數:
x_N
:初始噪聲張量latents
:用於繼續生成的先前潛在變量num_steps
:自定義採樣步驟數sampler
:自定義採樣器函數scheduler
:自定義調度器函數guidance_start_step
:開始引導的步驟generator
:用於重現性的隨機數生成器unconfident_prompt
:自定義無信心提示詞文本
📄 許可證
本項目採用 MIT 許可證。
📚 引用
如果您在實驗中使用了本倉庫,請引用以下論文:
@article{degeorge2025farimagenettexttoimagegeneration,
title ={How far can we go with ImageNet for Text-to-Image generation?},
author ={Lucas Degeorge and Arijit Ghosh and Nicolas Dufour and David Picard and Vicky Kalogeiton},
year ={2025},
journal ={arXiv},
}









