模型简介
模型特点
模型能力
使用案例
🚀 UIClip模型
UIClip是一款专为量化用户界面(UI)截图与文本描述之间的设计质量和相关性而设计的模型。它还能生成自然语言设计建议,为UI设计评估提供了强大的支持。
🚀 快速开始
UIClip模型可用于评估用户界面(UI)截图与文本描述的设计质量和相关性,还能生成自然语言设计建议。以下是使用该模型的示例代码:
import torch
from transformers import CLIPProcessor, CLIPModel
IMG_SIZE = 224
DEVICE = "cpu" # can also be "cuda" or "mps"
LOGIT_SCALE = 100 # based on OpenAI's CLIP example code
NORMALIZE_SCORING = True
model_path="uiclip_jitteredwebsites-2-224-paraphrased_webpairs_humanpairs" # can also be regular or web pairs variants
processor_path="openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_path)
model = model.eval()
model = model.to(DEVICE)
processor = CLIPProcessor.from_pretrained(processor_path)
def compute_quality_scores(input_list):
# input_list is a list of types where the first element is a description and the second is a PIL image
description_list = ["ui screenshot. well-designed. " + input_item[0] for input_item in input_list]
img_list = [input_item[1] for input_item in input_list]
text_embeddings_tensor = compute_description_embeddings(description_list) # B x H
img_embeddings_tensor = compute_image_embeddings(img_list) # B x H
# normalize tensors
text_embeddings_tensor /= text_embeddings_tensor.norm(dim=-1, keepdim=True)
img_embeddings_tensor /= img_embeddings_tensor.norm(dim=-1, keepdim=True)
if NORMALIZE_SCORING:
text_embeddings_tensor_poor = compute_description_embeddings([d.replace("well-designed. ", "poor design. ") for d in description_list]) # B x H
text_embeddings_tensor_poor /= text_embeddings_tensor_poor.norm(dim=-1, keepdim=True)
text_embeddings_tensor_all = torch.stack((text_embeddings_tensor, text_embeddings_tensor_poor), dim=1) # B x 2 x H
else:
text_embeddings_tensor_all = text_embeddings_tensor.unsqueeze(1)
img_embeddings_tensor = img_embeddings_tensor.unsqueeze(1) # B x 1 x H
scores = (LOGIT_SCALE * img_embeddings_tensor @ text_embeddings_tensor_all.permute(0, 2, 1)).squeeze(1)
if NORMALIZE_SCORING:
scores = scores.softmax(dim=-1)
return scores[:, 0]
def compute_description_embeddings(descriptions):
inputs = processor(text=descriptions, return_tensors="pt", padding=True)
inputs['input_ids'] = inputs['input_ids'].to(DEVICE)
inputs['attention_mask'] = inputs['attention_mask'].to(DEVICE)
text_embedding = model.get_text_features(**inputs)
return text_embedding
def compute_image_embeddings(image_list):
windowed_batch = [slide_window_over_image(img, IMG_SIZE) for img in image_list]
inds = []
for imgi in range(len(windowed_batch)):
inds.append([imgi for _ in windowed_batch[imgi]])
processed_batch = [item for sublist in windowed_batch for item in sublist]
inputs = processor(images=processed_batch, return_tensors="pt")
# run all sub windows of all images in batch through the model
inputs['pixel_values'] = inputs['pixel_values'].to(DEVICE)
with torch.no_grad():
image_features = model.get_image_features(**inputs)
# output contains all subwindows, need to mask for each image
processed_batch_inds = torch.tensor([item for sublist in inds for item in sublist]).long().to(image_features.device)
embed_list = []
for i in range(len(image_list)):
mask = processed_batch_inds == i
embed_list.append(image_features[mask].mean(dim=0))
image_embedding = torch.stack(embed_list, dim=0)
return image_embedding
def preresize_image(image, image_size):
aspect_ratio = image.width / image.height
if aspect_ratio > 1:
image = image.resize((int(aspect_ratio * image_size), image_size))
else:
image = image.resize((image_size, int(image_size / aspect_ratio)))
return image
def slide_window_over_image(input_image, img_size):
input_image = preresize_image(input_image, img_size)
width, height = input_image.size
square_size = min(width, height)
longer_dimension = max(width, height)
num_steps = (longer_dimension + square_size - 1) // square_size
if num_steps > 1:
step_size = (longer_dimension - square_size) // (num_steps - 1)
else:
step_size = square_size
cropped_images = []
for y in range(0, height - square_size + 1, step_size if height > width else square_size):
for x in range(0, width - square_size + 1, step_size if width > height else square_size):
left = x
upper = y
right = x + square_size
lower = y + square_size
cropped_image = input_image.crop((left, upper, right, lower))
cropped_images.append(cropped_image)
return cropped_images
# compute the quality scores for a list of descriptions (strings) and images (PIL images)
prediction_scores = compute_quality_scores(list(zip(test_descriptions, test_images)))
✨ 主要特性
- 多模态评估:结合文本描述和UI截图,量化设计质量和相关性。
- 设计建议生成:根据评估结果,生成自然语言设计建议。
- 大规模数据集训练:使用自动化爬取、合成增强和人工评分构建的大规模数据集进行训练。
- 高一致性:在与人类设计师评估结果的对比中,与真实排名的一致性最高。
📦 安装指南
文档中未提及具体安装步骤,故跳过该章节。
📚 详细文档
模型描述
UIClip是一个旨在根据文本描述量化用户界面(UI)截图的设计质量和相关性的模型。该模型还可用于生成自然语言设计建议(详见论文)。这是在UIST 2024上发表的论文“UIClip: A Data-driven Model for Assessing User Interface Design”(https://arxiv.org/abs/2404.12500)中描述的模型。
用户界面(UI)设计对于确保应用程序的可用性、可访问性和美学品质是一项困难但重要的任务。在论文中,我们开发了一个机器学习模型UIClip,用于根据UI的截图和自然语言描述评估其设计质量和视觉相关性。为了训练UIClip,我们结合了自动爬取、合成增强和人工评分,构建了一个大规模的UI数据集,按描述整理并按设计质量排名。通过在该数据集上进行训练,UIClip隐式地学习了好设计和坏设计的属性,具体通过以下两种方式:i) 分配一个代表UI设计相关性和质量的数值分数;ii) 提供设计建议。在一项将UIClip和其他基线模型的输出与12位人类设计师评估的UI进行比较的评估中,我们发现UIClip与真实排名的一致性最高。最后,我们展示了三个示例应用,说明了UIClip如何促进依赖于即时评估UI设计质量的下游应用:i) UI代码生成;ii) UI设计提示生成;iii) 质量感知的UI示例搜索。
模型信息
属性 | 详情 |
---|---|
开发者 | BigLab |
模型类型 | CLIP风格的多模态双编码器Transformer |
语言(NLP) | 英语 |
许可证 | MIT |
数据集
- biglab/jitteredwebsites-merged-224-paraphrased
- biglab/jitteredwebsites-merged-224-paraphrased-paired
- biglab/uiclip_human_data_hf
基础模型
- openai/clip-vit-base-patch32
- biglab/uiclip_jitteredwebsites-2-224-paraphrased_webpairs
🔧 技术细节
UIClip通过结合自动爬取、合成增强和人工评分构建了大规模的UI数据集。在训练过程中,模型学习了好设计和坏设计的属性,通过分配数值分数和提供设计建议来评估UI设计。在评估中,与其他基线模型相比,UIClip与人类设计师的评估结果一致性最高。
📄 许可证
本模型使用MIT许可证。









