B

Bytedance Research.ui TARS 72B SFT GGUF

Developed by DevQuasar
A 72B-parameter multimodal foundation model released by ByteDance Research, specializing in image-text-to-text tasks
Downloads 81
Release Time : 3/6/2025

Model Overview

This model is a large-scale multimodal model fine-tuned with supervision, capable of handling conversion tasks between images and text, with strong cross-modal understanding capabilities

Model Features

Large-scale parameters
72B parameters provide powerful model capacity and expressiveness
Multimodal capability
Capable of processing both visual and textual information for cross-modal understanding
Supervised fine-tuning
Optimized for specific tasks through specialized supervised fine-tuning (SFT)

Model Capabilities

Image understanding
Text generation
Cross-modal conversion
Visual question answering

Use Cases

Content generation
Image caption generation
Generate detailed textual descriptions based on input images
Can produce accurate and rich image descriptions
Assistive tools
Visual assistance
Provide image content descriptions for visually impaired users
Enhances accessibility capabilities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase