ByteDance UI-TARS-72B-SFT-GGUF Open-Source Model - Achieve Easy Image-Text to Text Conversion Function

Bytedance Research.ui TARS 72B SFT GGUF

Developed by DevQuasar

A 72B-parameter multimodal foundation model released by ByteDance Research, specializing in image-text-to-text tasks

Image-to-Text #Multimodal understanding #Large-scale parameters #Image-text generation

Downloads 81

Release Time : 3/6/2025

Model Overview

This model is a large-scale multimodal model fine-tuned with supervision, capable of handling conversion tasks between images and text, with strong cross-modal understanding capabilities

Model Features

Large-scale parameters

72B parameters provide powerful model capacity and expressiveness

Multimodal capability

Capable of processing both visual and textual information for cross-modal understanding

Supervised fine-tuning

Optimized for specific tasks through specialized supervised fine-tuning (SFT)

Model Capabilities

Image understanding

Text generation

Cross-modal conversion

Visual question answering

Use Cases

Content generation

Image caption generation

Generate detailed textual descriptions based on input images

Can produce accurate and rich image descriptions

Assistive tools

Visual assistance

Provide image content descriptions for visually impaired users

Enhances accessibility capabilities

Property	Details
Base Model	bytedance-research/UI-TARS-72B-SFT
Pipeline Tag	image-text-to-text

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Bytedance Research.ui TARS 72B SFT GGUF

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Quantized UI-TARS-72B-SFT Model

Model Information

Model Version

Support the Project