S

SESAME

Developed by tsunghanwu
SESAME is an open-source multimodal model, fine-tuned on various instruction-based image localization (segmentation) datasets based on the LLaVA model.
Downloads 37
Release Time : 4/25/2025

Model Overview

SESAME is primarily used for research related to large multimodal models and chatbots, consisting of an autoregressive language model and a segmentation model, supporting image localization and segmentation tasks.

Model Features

Multimodal Capability
Combines language models and visual segmentation models to support multimodal interaction between images and text.
Open-Source Model
Open-sourced under the MIT license, facilitating research and secondary development.
Instruction-Driven Image Segmentation
Capable of performing image localization and segmentation tasks based on natural language instructions.

Model Capabilities

Image Segmentation
Natural Language Understanding
Multimodal Interaction

Use Cases

Computer Vision Research
Image Segmentation Research
Used for researching natural language instruction-based image segmentation techniques.
Multimodal Model Development
Chatbot Enhancement
Adds image understanding and segmentation capabilities to chatbots.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase