mar_test2開源圖像生成模型 - 無需向量量化實現高質量圖像創作

首頁

Mar Test2

由V3nator開發

一種創新的自迴歸圖像生成方法，通過消除向量量化需求實現高質量圖像生成

圖像生成開源協議:MIT #連續值空間生成 #免向量量化 #自迴歸擴散

下載量 39

發布時間 : 1/22/2025

模型概述

該模型採用連續值空間運作，利用擴散過程對每個標記的概率分佈進行建模，而非依賴離散標記，簡化了生成流程並拓展了應用領域

模型特點

無向量量化

消除傳統自迴歸模型對向量量化的依賴，直接在連續值空間運作

擴散損失函數

引入擴散損失函數建模標記概率分佈，保持自迴歸速度優勢的同時提升生成質量

多規模預訓練

提供base/large/huge三種預訓練模型規模，適應不同計算需求

模型能力

無條件圖像生成

高質量圖像合成

連續值空間建模

使用案例

創意設計

概念藝術生成

快速生成創意概念圖像

高質量且多樣化的視覺輸出

數據增強

訓練數據擴充

為視覺模型訓練生成補充數據

提升模型泛化能力

🚀 無向量量化的自迴歸圖像生成

本模型（MAR）引入了一種新穎的自迴歸圖像生成方法，無需進行向量量化。該模型不依賴離散令牌，而是在連續值空間中使用擴散過程對每個令牌的概率分佈進行建模。通過採用擴散損失函數，模型在實現高效高質量圖像生成的同時，還能受益於自迴歸序列建模的速度優勢。這種方法簡化了生成過程，使其不僅適用於圖像合成，還能應用於更廣泛的連續值領域。它基於這篇論文。

🚀 快速開始

你可以通過Hugging Face的DiffusionPipeline輕鬆加載該模型，並可選擇自定義各種參數，如模型類型、步數和類別標籤。

from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")

# generate an image with the model
generated_image = pipeline(
    model_type="mar_huge",  # choose from 'mar_base', 'mar_large', or 'mar_huge'
    seed=42,                # set a seed for reproducibility
    num_ar_steps=64,        # number of autoregressive steps
    class_labels=[207, 360, 388],  # provide valid ImageNet class labels
    cfg_scale=4,            # classifier-free guidance scale
    output_dir="./images",   # directory to save generated images
    cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear'
)

# display the generated image
generated_image.show()

此代碼加載模型，配置其進行圖像生成，並將輸出保存到指定目錄。

我們以safetensors格式提供了三個預訓練的MAR模型：

mar-base.safetensors
mar-large.safetensors
mar-huge.safetensors

這是論文無向量量化的自迴歸圖像生成在Hugging Face Diffusers/GPU上的實現。

官方的PyTorch實現發佈在這個倉庫。

@article{li2024autoregressive,
  title={Autoregressive Image Generation without Vector Quantization},
  author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming},
  journal={arXiv preprint arXiv:2406.11838},
  year={2024}
}

✨ 主要特性

引入新穎的自迴歸圖像生成方法，無需向量量化。
在連續值空間中使用擴散過程對每個令牌的概率分佈進行建模。
採用擴散損失函數，實現高效高質量圖像生成，同時受益於自迴歸序列建模的速度優勢。
簡化生成過程，適用於更廣泛的連續值領域。

💻 使用示例

基礎用法

from diffusers import DiffusionPipeline

# load the pretrained model
pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")

# generate an image with the model
generated_image = pipeline(
    model_type="mar_huge",  # choose from 'mar_base', 'mar_large', or 'mar_huge'
    seed=42,                # set a seed for reproducibility
    num_ar_steps=64,        # number of autoregressive steps
    class_labels=[207, 360, 388],  # provide valid ImageNet class labels
    cfg_scale=4,            # classifier-free guidance scale
    output_dir="./images",   # directory to save generated images
    cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear'
)

# display the generated image
generated_image.show()

高級用法

你可以根據實際需求進一步調整模型參數，以實現不同的圖像生成效果。例如，調整num_ar_steps來改變自迴歸步數，或者調整cfg_scale來改變分類器自由引導比例。

# 這裡可以根據具體的高級場景進行說明
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar")

# 調整參數以實現不同效果
generated_image = pipeline(
    model_type="mar_large",  # 選擇不同的模型類型
    seed=123,                # 設置不同的種子以獲得不同的隨機結果
    num_ar_steps=128,        # 增加自迴歸步數以提高圖像質量
    class_labels=[100, 200, 300],  # 提供不同的ImageNet類別標籤
    cfg_scale=6,            # 調整分類器自由引導比例
    output_dir="./new_images",   # 保存到不同的目錄
    cfg_schedule = "linear", # 選擇不同的調度策略
)

generated_image.show()