🚀 Ferret-UI(Gemma-2B版本)
Ferret-UI是首個以用戶界面(UI)為中心的多模態大語言模型(MLLM),專為指稱、定位和推理任務而設計。它基於Gemma-2B和Llama-3-8B構建,能夠執行復雜的UI任務。此為Ferret-UI的Gemma-2B版本,其靈感來源於蘋果公司的這篇論文。
🚀 快速開始
📦 安裝指南
你需要先將builder.py
、conversation.py
、inference.py
、model_UI.py
和mm_utils.py
下載到本地。
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/conversation.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/builder.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/inference.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/model_UI.py
wget https://huggingface.co/jadechoghari/Ferret-UI-Gemma2b/raw/main/mm_utils.py
💻 使用示例
基礎用法
from inference import inference_and_run
image_path = "appstore_reminders.png"
prompt = "Describe the image in details"
inference_text = inference_and_run(image_path, prompt, conv_mode="ferret_gemma_instruct", model_path="jadechoghari/Ferret-UI-Gemma2b")
print("Inference Text:", inference_text)
高級用法
image_path = "appstore_reminders.png"
prompt = "What's inside the selected region?"
box = [189, 906, 404, 970]
inference_text = inference_and_run(
image_path=image_path,
prompt=prompt,
conv_mode="ferret_gemma_instruct",
model_path="jadechoghari/Ferret-UI-Gemma2b",
box=box
)
print("Inference Text:", inference_text)
定位提示
GROUNDING_TEMPLATES = [
'\nProvide the bounding boxes of the mentioned objects.',
'\nInclude the coordinates for each mentioned object.',
'\nLocate the objects with their coordinates.',
'\nAnswer in [x1, y1, x2, y2] format.',
'\nMention the objects and their locations using the format [x1, y1, x2, y2].',
'\nDraw boxes around the mentioned objects.',
'\nUse boxes to show where each thing is.',
'\nTell me where the objects are with coordinates.',
'\nList where each object is with boxes.',
'\nShow me the regions with boxes.'
]