開源UIClip模型 - 精準量化UI截圖與文本描述的設計質量和相關性

首頁

Uiclip Jitteredwebsites 2 224 Paraphrased Webpairs Humanpairs

由biglab開發

UIClip 是一個用於量化用戶界面（UI）截圖在給定文本描述下的設計質量和相關性的模型。

多模態融合

Transformers

開源協議:MIT #UI設計評估 #多模態評分 #設計建議生成

下載量 232

發布時間 : 3/31/2024

模型概述

UIClip 是一個基於 CLIP 風格的多模態雙編碼器 Transformer 模型，用於評估 UI 的設計質量和視覺相關性，基於其截圖和自然語言描述。該模型還可用於生成自然語言設計建議。

模型特點

設計質量評估

能夠量化 UI 截圖在給定文本描述下的設計質量和相關性。

自然語言設計建議

可以生成自然語言的設計建議，幫助改進 UI 設計。

大規模數據集訓練

結合了自動爬取、合成增強和人工評分構建的大規模 UI 數據集進行訓練。

多模態學習

同時處理圖像（UI 截圖）和文本（描述）輸入，學習兩者之間的關聯。

模型能力

UI 設計質量評分

UI 設計相關性評估

自然語言設計建議生成

使用案例

UI 設計輔助

UI 代碼生成

利用 UIClip 評估生成的 UI 代碼的設計質量。

提高生成 UI 的設計質量和可用性。

UI 設計提示生成

基於 UIClip 的評估結果生成改進 UI 設計的提示。

幫助設計師快速識別和改進設計問題。

質量感知的 UI 示例搜索

使用 UIClip 評分篩選高質量的 UI 設計示例。

提供更相關和高質量的 UI 設計參考。

🚀 UIClip模型

UIClip是一款專為量化用戶界面（UI）截圖與文本描述之間的設計質量和相關性而設計的模型。它還能生成自然語言設計建議，為UI設計評估提供了強大的支持。

🚀 快速開始

UIClip模型可用於評估用戶界面（UI）截圖與文本描述的設計質量和相關性，還能生成自然語言設計建議。以下是使用該模型的示例代碼：

import torch
from transformers import CLIPProcessor, CLIPModel

IMG_SIZE = 224
DEVICE = "cpu" # can also be "cuda" or "mps"
LOGIT_SCALE = 100 # based on OpenAI's CLIP example code
NORMALIZE_SCORING = True

model_path="uiclip_jitteredwebsites-2-224-paraphrased_webpairs_humanpairs" # can also be regular or web pairs variants
processor_path="openai/clip-vit-base-patch32"

model = CLIPModel.from_pretrained(model_path)
model = model.eval()
model = model.to(DEVICE)

processor = CLIPProcessor.from_pretrained(processor_path)

def compute_quality_scores(input_list):
    # input_list is a list of types where the first element is a description and the second is a PIL image
    description_list = ["ui screenshot. well-designed. " + input_item[0] for input_item in input_list]
    img_list = [input_item[1] for input_item in input_list]
    text_embeddings_tensor = compute_description_embeddings(description_list) # B x H
    img_embeddings_tensor = compute_image_embeddings(img_list) # B x H

    # normalize tensors
    text_embeddings_tensor /= text_embeddings_tensor.norm(dim=-1, keepdim=True)
    img_embeddings_tensor /= img_embeddings_tensor.norm(dim=-1, keepdim=True)

    if NORMALIZE_SCORING:
        text_embeddings_tensor_poor = compute_description_embeddings([d.replace("well-designed. ", "poor design. ") for d in description_list]) # B x H
        text_embeddings_tensor_poor /= text_embeddings_tensor_poor.norm(dim=-1, keepdim=True)
        text_embeddings_tensor_all = torch.stack((text_embeddings_tensor, text_embeddings_tensor_poor), dim=1) # B x 2 x H
    else:
        text_embeddings_tensor_all = text_embeddings_tensor.unsqueeze(1)

    img_embeddings_tensor = img_embeddings_tensor.unsqueeze(1) # B x 1 x H

    scores = (LOGIT_SCALE * img_embeddings_tensor @ text_embeddings_tensor_all.permute(0, 2, 1)).squeeze(1)

    if NORMALIZE_SCORING:
        scores = scores.softmax(dim=-1)

    return scores[:, 0]

def compute_description_embeddings(descriptions):
    inputs = processor(text=descriptions, return_tensors="pt", padding=True)
    inputs['input_ids'] = inputs['input_ids'].to(DEVICE)
    inputs['attention_mask'] = inputs['attention_mask'].to(DEVICE)
    text_embedding = model.get_text_features(**inputs)
    return text_embedding

def compute_image_embeddings(image_list):
    windowed_batch = [slide_window_over_image(img, IMG_SIZE) for img in image_list]
    inds = []
    for imgi in range(len(windowed_batch)):
        inds.append([imgi for _ in windowed_batch[imgi]])

    processed_batch = [item for sublist in windowed_batch for item in sublist]
    inputs = processor(images=processed_batch, return_tensors="pt")
    # run all sub windows of all images in batch through the model
    inputs['pixel_values'] = inputs['pixel_values'].to(DEVICE)
    with torch.no_grad():
        image_features = model.get_image_features(**inputs)

    # output contains all subwindows, need to mask for each image
    processed_batch_inds = torch.tensor([item for sublist in inds for item in sublist]).long().to(image_features.device)
    embed_list = []
    for i in range(len(image_list)):
        mask = processed_batch_inds == i
        embed_list.append(image_features[mask].mean(dim=0))
    image_embedding = torch.stack(embed_list, dim=0)
    return image_embedding

def preresize_image(image, image_size):
    aspect_ratio = image.width / image.height
    if aspect_ratio > 1:
        image = image.resize((int(aspect_ratio * image_size), image_size))
    else:
        image = image.resize((image_size, int(image_size / aspect_ratio)))
    return image

def slide_window_over_image(input_image, img_size):
    input_image = preresize_image(input_image, img_size)
    width, height = input_image.size
    square_size = min(width, height)
    longer_dimension = max(width, height)
    num_steps = (longer_dimension + square_size - 1) // square_size

    if num_steps > 1:
        step_size = (longer_dimension - square_size) // (num_steps - 1)
    else:
        step_size = square_size

    cropped_images = []

    for y in range(0, height - square_size + 1, step_size if height > width else square_size):
        for x in range(0, width - square_size + 1, step_size if width > height else square_size):
            left = x
            upper = y
            right = x + square_size
            lower = y + square_size
            cropped_image = input_image.crop((left, upper, right, lower))
            cropped_images.append(cropped_image)

    return cropped_images


# compute the quality scores for a list of descriptions (strings) and images (PIL images)
prediction_scores = compute_quality_scores(list(zip(test_descriptions, test_images)))

✨ 主要特性

多模態評估：結合文本描述和UI截圖，量化設計質量和相關性。
設計建議生成：根據評估結果，生成自然語言設計建議。
大規模數據集訓練：使用自動化爬取、合成增強和人工評分構建的大規模數據集進行訓練。
高一致性：在與人類設計師評估結果的對比中，與真實排名的一致性最高。

📦 安裝指南

文檔中未提及具體安裝步驟，故跳過該章節。

📚 詳細文檔

模型描述

UIClip是一個旨在根據文本描述量化用戶界面（UI）截圖的設計質量和相關性的模型。該模型還可用於生成自然語言設計建議（詳見論文）。這是在UIST 2024上發表的論文“UIClip: A Data-driven Model for Assessing User Interface Design”（https://arxiv.org/abs/2404.12500）中描述的模型。

用戶界面（UI）設計對於確保應用程序的可用性、可訪問性和美學品質是一項困難但重要的任務。在論文中，我們開發了一個機器學習模型UIClip，用於根據UI的截圖和自然語言描述評估其設計質量和視覺相關性。為了訓練UIClip，我們結合了自動爬取、合成增強和人工評分，構建了一個大規模的UI數據集，按描述整理並按設計質量排名。通過在該數據集上進行訓練，UIClip隱式地學習了好設計和壞設計的屬性，具體通過以下兩種方式：i) 分配一個代表UI設計相關性和質量的數值分數；ii) 提供設計建議。在一項將UIClip和其他基線模型的輸出與12位人類設計師評估的UI進行比較的評估中，我們發現UIClip與真實排名的一致性最高。最後，我們展示了三個示例應用，說明了UIClip如何促進依賴於即時評估UI設計質量的下游應用：i) UI代碼生成；ii) UI設計提示生成；iii) 質量感知的UI示例搜索。