Bytedance Research.ui TARS 72B SFT GGUF
A 72B-parameter multimodal foundation model released by ByteDance Research, specializing in image-text-to-text tasks
Downloads 81
Release Time : 3/6/2025
Model Overview
This model is a large-scale multimodal model fine-tuned with supervision, capable of handling conversion tasks between images and text, with strong cross-modal understanding capabilities
Model Features
Large-scale parameters
72B parameters provide powerful model capacity and expressiveness
Multimodal capability
Capable of processing both visual and textual information for cross-modal understanding
Supervised fine-tuning
Optimized for specific tasks through specialized supervised fine-tuning (SFT)
Model Capabilities
Image understanding
Text generation
Cross-modal conversion
Visual question answering
Use Cases
Content generation
Image caption generation
Generate detailed textual descriptions based on input images
Can produce accurate and rich image descriptions
Assistive tools
Visual assistance
Provide image content descriptions for visually impaired users
Enhances accessibility capabilities
Featured Recommended AI Models