Tar-1.5B Open-Source Model - Unified Application of Visual Understanding and Generation through Text Alignment

Tar 1.5B

Developed by csuhan

A unified model for visual understanding and generation through text-aligned representations

Open Source License:Apache-2.0 #Text-aligned Vision #Multimodal Unified Model #Integrated Visual Understanding and Generation

Downloads 253

Release Time : 6/11/2025

Model Overview

Tar is a model that unifies visual understanding and generation through text-aligned representations, providing new ideas and methods for research and applications in the visual field.

Model Features

Text-aligned Representations

Unify visual understanding and generation tasks through text-aligned representation methods

Multitask Unification

Support both visual understanding and generation tasks in a single model

Open-source License

Adopt the Apache 2.0 license, allowing commercial and research use

Model Capabilities

Visual Understanding

Image Generation

Vision-Language Alignment

Multimodal Task Processing

Use Cases

Computer Vision

Image Caption Generation

Generate text descriptions for input images

Text-to-Image Generation

Generate corresponding images based on text descriptions

Education

Visual-assisted Learning

Assist learning through the interaction of vision and text

Property	Details
Base Model	Qwen/Qwen2.5-1.5B-Instruct
Pipeline Tag	any-to-any

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Tar 1.5B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Unifying Visual Understanding and Generation via Text-Aligned Representations

📄 License

📚 Documentation

Authors

Links

Visual Demo

Citation

Model Information