E

Eagle2 9B

Developed by nvidia
Eagle2-9B is the latest Vision-Language Model (VLM) released by NVIDIA, achieving a perfect balance between performance and inference speed. It is built on the Qwen2.5-7B-Instruct language model and the Siglip+ConvNext vision model, supporting multilingual and multimodal tasks.
Downloads 944
Release Time : 1/10/2025

Model Overview

Eagle2-9B is a high-performance open-source vision-language model that focuses on optimizing the post-training of VLM from a data center perspective. Through a combination of a robust training scheme and model design, it performs excellently in multiple benchmark tests.

Model Features

High-performance Balance
Achieves a perfect balance between performance and inference speed at a scale of 8.9B parameters
Multimodal Support
Supports text, image, and video inputs, handling information from multiple modalities
Long Context Processing
Supports a context length of up to 16K
Leading in Benchmark Tests
Outperforms similar models in multiple vision-language benchmark tests

Model Capabilities

Image Understanding
Text Generation
Multimodal Dialogue
Document Question Answering
Chart Understanding
Video Analysis

Use Cases

Document Processing
DocVQA Document Question Answering
Extract information from document images and answer questions
Achieved 92.6 points on the DocVQA test set
Visual Question Answering
TextVQA Text Visual Question Answering
Answer questions about the text content in images
Achieved 83.0 points on the TextVQA validation set
Chart Understanding
ChartQA Chart Question Answering
Understand and answer questions based on chart data
Achieved 86.4 points on the ChartQA test set
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase