DanbooruCLIP開源模型 - 免費用於動漫圖像標籤精準分類

首頁

Danbooruclip

由OysterQAQ開發

基於Danbooru2021數據集微調的CLIP模型，專門用於動漫圖像標籤分類

文本生成圖像

Transformers

#二次元圖像標註 #多標籤分類 #角色作品識別

下載量 502

發布時間 : 5/18/2023

模型概述

該模型是對CLIP（ViT-L/14）進行微調的版本，專門針對動漫圖像內容理解優化，能夠識別角色、作品和通用標籤

模型特點

動漫內容優化

專門針對動漫圖像內容進行優化，能準確識別角色、作品等動漫特有元素

多標籤處理

能夠處理複雜的多標籤分類任務，包括角色、作品和通用標籤

標籤預處理

包含智能的標籤預處理流程，優先提取角色和作品標籤

模型能力

動漫圖像識別

多標籤分類

角色識別

作品識別

通用標籤識別

使用案例

動漫內容管理

動漫圖像自動標註

為動漫圖像自動生成準確的標籤

可識別角色、作品和通用特徵

動漫內容檢索

基於圖像內容搜索相似動漫作品

提高動漫數據庫的檢索效率

動漫社區應用

內容推薦

基於圖像內容為用戶推薦相似動漫作品

提升用戶體驗和參與度

🚀 DanbooruCLIP模型

DanbooruCLIP模型基於clip（ViT - L/14）進行微調，使用danburoo2021和pixiv數據集訓練，可用於圖像與文本的匹配任務，在動漫相關圖像識別等場景有較好表現。

🚀 快速開始

模型使用示例

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("OysterQAQ/DanbooruCLIP")
processor = CLIPProcessor.from_pretrained("OysterQAQ/DanbooruCLIP")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image - text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

✨ 主要特性

2023年7月17日更新，增加了pixiv數據集進行訓練，提升模型泛化能力。
使用danburoo2021數據集對clip（ViT - L/14）模型進行微調，優化模型性能。

📦 安裝指南

文檔未提及安裝步驟，可參考transformers庫的安裝方式來安裝依賴。

💻 使用示例

基礎用法

from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("OysterQAQ/DanbooruCLIP")
processor = CLIPProcessor.from_pretrained("OysterQAQ/DanbooruCLIP")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image - text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

高級用法

文檔未提供高級用法示例。

🔧 技術細節

訓練參數

0 - 3 epoch學習率為4e - 6，權重衰減為1e - 3。
4 - 8 epoch學習率為1e - 6，權重衰減為1e - 3。

標籤預處理過程

            for i in range(length):
                # 加載並且縮放圖片
                if not is_image(data_from_db.path[i]):
                    continue

                try:
                    img = self.preprocess(
                        Image.open(data_from_db.path[i].replace("./", "/mnt/lvm/danbooru2021/danbooru2021/")))
                except Exception as e:
                    #print(e)
                    continue
                # 處理標籤
                tags = json.loads(data_from_db.tags[i])
                # 優先選擇人物和作品標籤
                category_group = {}
                for tag in tags:
                    category_group.setdefault(tag["category"], []).append(tag)

                # category_group=groupby(tags, key=lambda x: (x["category"]))
                character_list = category_group[4] if 4 in category_group else []
                # 作品需要過濾以bad開頭的

                work_list = list(filter(
                    lambda e:
                               e["name"] != "original"
                            , category_group[3])) if 3 in category_group else []
                # work_list=  category_group[5] if 5 in category_group else []
                general_list = category_group[0] if 0 in category_group else []
                caption = ""
                caption_2 = None
                for character in character_list:
                    if len(work_list) != 0:
                        # 去除括號內作品內容
                        character["name"] = re.sub(u"\\(.*?\\)", "", character["name"])
                    caption += character["name"].replace("_", " ")
                    caption += ","
                caption = caption[:-1]
                caption += " "
                if len(work_list) != 0:
                    caption += "from "
                for work in work_list:
                    caption += work["name"].replace("_", " ")
                    caption += " "
                # 普通標籤
                if len(general_list) != 0:
                    caption += "with "
                if len(general_list) > 20:
                    general_list_1 = general_list[:int(len(general_list) / 2)]
                    general_list_2 = general_list[int(len(general_list) / 2):]
                    caption_2 = caption
                    for general in general_list_1:
                        if general["name"].find("girl") == -1 and general["name"].find("boy") == -1 and len(
                                re.findall(is_contain, general["name"])) != 0:
                            caption_2 += general["name"].replace("_", " ")
                            caption_2 += ","
                    caption_2 = caption_2[:-1]
                    for general in general_list_2:
                        if general["name"].find("girl") == -1 and general["name"].find("boy") == -1 and len(
                                re.findall(is_contain, general["name"])) != 0:
                            caption += general["name"].replace("_", " ")
                            caption += ","
                    caption = caption[:-1]
                else:
                    for general in general_list:
                        # 如果標籤數據目大於20 則拆分成兩個caption
                        if general["name"].find("girl") == -1 and general["name"].find("boy") == -1 and len(
                                re.findall(is_contain, general["name"])) != 0:
                            caption += general["name"].replace("_", " ")
                            caption += ","
                    caption = caption[:-1]

                # 標籤彙總成語句
                # tokenize語句
                # 返回
                # 過長截斷 不行的話用huggingface的
                text_1 = clip.tokenize(texts=caption, truncate=True)
                text_2= None
                if caption_2 is not None:
                    text_2 = clip.tokenize(texts=caption_2, truncate=True)
                # 處理邏輯

                # print(img)
                yield img, text_1[0]
                if text_2 is not None:
                    yield img, text_2[0]

📚 詳細文檔

反饋渠道

如果您有關於該模型的問題或建議，請使用此Google表單進行反饋。

示例展示

示例標題	圖片鏈接	候選標籤
Azur Lane	示例圖片	Azur Lane, 3 girl with sword, cat ear, a dog
cirno & daiyousei	示例圖片	1 girl with black hair, rabbit ear, big breasts, minato aqua, fate/extra, k - on!, daiyousei, cirno