SESAME Open-Source Multimodal Model - Free to Use for Precise Image Localization and Segmentation

SESAME

Developed by tsunghanwu

SESAME is an open-source multimodal model, fine-tuned on various instruction-based image localization (segmentation) datasets based on the LLaVA model.

Text-to-Image

Transformers

Open Source License:MIT #Multimodal Image Segmentation #Instruction-Driven Localization #Vision-Language Joint Training

Downloads 37

Release Time : 4/25/2025

Model Overview

SESAME is primarily used for research related to large multimodal models and chatbots, consisting of an autoregressive language model and a segmentation model, supporting image localization and segmentation tasks.

Model Features

Multimodal Capability

Combines language models and visual segmentation models to support multimodal interaction between images and text.

Open-Source Model

Open-sourced under the MIT license, facilitating research and secondary development.

Instruction-Driven Image Segmentation

Capable of performing image localization and segmentation tasks based on natural language instructions.

Model Capabilities

Image Segmentation

Natural Language Understanding

Multimodal Interaction

Use Cases

Computer Vision Research

Image Segmentation Research

Used for researching natural language instruction-based image segmentation techniques.

Multimodal Model Development

Chatbot Enhancement

Adds image understanding and segmentation capabilities to chatbots.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

SESAME

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 SESAME

🚀 Quick Start

✨ Features

📚 Documentation

General Information

Intended Use

Training Dataset

📄 License