C

Cogvlm2 Llama3 Chat 19B

Developed by THUDM
CogVLM2 is a multimodal large model built upon Meta-Llama-3-8B-Instruct, supporting image understanding and dialogue tasks with 8K context length and 1344x1344 image resolution processing capability.
Downloads 7,805
Release Time : 5/16/2024

Model Overview

A next-generation vision-language model excelling in multiple benchmarks, supporting Chinese-English multimodal interaction.

Model Features

High-Performance Multimodal Understanding
Significantly outperforms previous models in benchmarks like TextVQA and DocVQA
Long Context Support
Supports 8K-length context memory
High-Resolution Image Processing
Supports image inputs up to 1344x1344 pixels
Bilingual Support
Offers Chinese-English bilingual version (cogvlm2-llama3-chinese-chat-19B)

Model Capabilities

Image content understanding
Document QA
Chart analysis
Multi-turn dialogue
Cross-modal reasoning

Use Cases

Document Processing
Document Content QA
Parse PDF/image documents and answer related questions
Achieved 92.3 score on DocVQA benchmark
Visual Question Answering
Image Content QA
Answer complex questions about image content
Achieved 84.2 score on TextVQA benchmark
Educational Assistance
Chart Analysis
Interpret and analyze various data charts
Achieved 81.0 score on ChartQA benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase