C

Cogvlm2 Llama3 Chat 19B Int4

Developed by THUDM
CogVLM2 is a multimodal dialogue model based on Meta-Llama-3-8B-Instruct, supporting both Chinese and English, with 8K context length and 1344*1344 resolution image processing capabilities.
Downloads 467
Release Time : 5/24/2024

Model Overview

The new generation of CogVLM2 open-source models performs excellently in multiple benchmarks, supporting high-resolution image understanding and long-context dialogue.

Model Features

High-Performance Multimodal Understanding
Excels in benchmarks like TextVQA and DocVQA, surpassing previous-generation models
Long Context Support
Supports 8K-length contextual dialogue
High-Resolution Image Processing
Supports image inputs up to 1344*1344 resolution
Bilingual Support
Supports multimodal dialogue in both Chinese and English

Model Capabilities

Multimodal dialogue
Image content understanding
Long-text generation
Document QA
Chart understanding
OCR capabilities

Use Cases

Document Processing
Document QA
Understand and answer questions about uploaded documents
Achieved 92.3 points on the DocVQA benchmark
Image Understanding
Image Content QA
Describe and answer questions about image content
Achieved 85.0 points on the TextVQA benchmark
Chart Analysis
Chart Understanding
Parse chart content and answer questions
Achieved 81.0 points on the ChartQA benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase