Hunyuan TurboS Vision

A multi-modal lightweight model developed by Tencent Hunyuan Team, which supports video clip input and real-time API parameterized interaction, and optimizes long text processing and cross-modal reasoning capabilities. It is suitable for image-text understanding scenarios. It is a new-generation flagship large vision-language model based on the latest Hunyuan turbos, focusing on tasks related to image-text understanding, including entity recognition based on images, knowledge Q&A, copywriting creation, photo-based problem-solving, etc. It has been comprehensively improved compared with the previous generation of models.

Intelligence(Medium)

Speed(Slow)

Input Supported Modalities

Yes

Is Reasoning Model

8,000

Context Window

2,000

Maximum Output Tokens

2024-10-31

Knowledge Cutoff

Go Compare

Pricing

￥3 /M tokens

Input

￥9 /M tokens

Output

￥8 /M tokens

Blended Price

Quick Simple Comparison

Input

Output

Hunyuan-T1-20250403

￥0.14

Hunyuan-Vision

￥2.5

HunYuan-TurboS

￥0.11

Basic Parameters

Hunyuan-TurboS-VisionTechnical Parameters

Parameter Count

Not Announced

Context Length

8,000 tokens

Training Data Cutoff

2024-10-31

Open Source Category

Proprietary

Multimodal Support

Text, Image

Throughput

850

Release Date

2025-04-07

Response Speed

18.6 tokens/s

Benchmark Scores

Below is the performance of Hunyuan-TurboS-Vision in various standard benchmark tests. These tests evaluate the model's capabilities in different tasks and domains.

Intelligence Index

Large Language Model Intelligence Level

45.3

Large Language Model Intelligence Level

Coding Index

Indicator of AI model performance on coding tasks

68.4

Indicator of AI model performance on coding tasks

Math Index

Capability indicator in solving mathematical problems, mathematical reasoning, or performing math-related tasks

62.8

Capability indicator in solving mathematical problems, mathematical reasoning, or performing math-related tasks