Model Selection

High-Precision Image Understanding

# High-Precision Image Understanding

LLaDA-V is a vision-language model based on the diffusion model, outperforming other diffusion multimodal large language models in performance.

Internvl3 8B Bf16

InternVL3-8B-bf16 is a vision-language model based on MLX format conversion, supporting multilingual image-to-text tasks.

Transformers Other

Sarashina2 Vision 14b

Sarashina2-Vision-14B is a large Japanese visual language model developed by SB Intuitions, combining Sarashina2-13B with Qwen2-VL-7B's image encoder, achieving excellent performance in multiple benchmarks.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase