C

Cogvlm2 Llama3 Chinese Chat 19B

Developed by THUDM
CogVLM2 is a multimodal large model built on Meta-Llama-3-8B-Instruct, supporting both Chinese and English with powerful image understanding and dialogue capabilities.
Downloads 118
Release Time : 5/16/2024

Model Overview

The new generation of CogVLM2 series models supports 8K context length and 1344*1344 resolution image input, demonstrating excellent performance in multiple benchmarks.

Model Features

Multimodal Capability
Supports joint understanding and generation of images and text
High-Resolution Support
Supports image input up to 1344*1344 resolution
Long-Context Processing
Supports 8K-length context processing
Bilingual Support
Supports dialogue and understanding in both Chinese and English

Model Capabilities

Image Understanding
Text Generation
Multimodal Dialogue
Document Analysis
Chart Understanding

Use Cases

Visual Question Answering
Image Content Q&A
Answer various questions about image content
Achieved 85.0 points on the TextVQA benchmark
Document Processing
Document Understanding and Q&A
Parse document content and answer related questions
Achieved 88.4 points on the DocVQA benchmark
Chart Analysis
Chart Data Interpretation
Understand chart content and extract key information
Achieved 74.7 points on the ChartQA benchmark
Featured Recommended AI Models
ยฉ 2025AIbase