SESAME
SESAME is an open-source multimodal model, fine-tuned on various instruction-based image localization (segmentation) datasets based on the LLaVA model.
Downloads 37
Release Time : 4/25/2025
Model Overview
SESAME is primarily used for research related to large multimodal models and chatbots, consisting of an autoregressive language model and a segmentation model, supporting image localization and segmentation tasks.
Model Features
Multimodal Capability
Combines language models and visual segmentation models to support multimodal interaction between images and text.
Open-Source Model
Open-sourced under the MIT license, facilitating research and secondary development.
Instruction-Driven Image Segmentation
Capable of performing image localization and segmentation tasks based on natural language instructions.
Model Capabilities
Image Segmentation
Natural Language Understanding
Multimodal Interaction
Use Cases
Computer Vision Research
Image Segmentation Research
Used for researching natural language instruction-based image segmentation techniques.
Multimodal Model Development
Chatbot Enhancement
Adds image understanding and segmentation capabilities to chatbots.
Featured Recommended AI Models
Š 2025AIbase