L

Llama 3.2V 11B Cot

Developed by Xkev
Llama-3.2V-11B-cot is a visual-language model capable of spontaneous and systematic reasoning, developed based on the LLaVA-CoT framework.
Downloads 5,089
Release Time : 11/19/2024

Model Overview

This model is the first version of LLaVA-CoT, focusing on step-by-step reasoning in visual-language tasks, supporting image-to-text conversion and understanding.

Model Features

Step-by-Step Reasoning
Supports systematic, step-by-step visual-language reasoning, capable of handling complex multimodal tasks.
High-Performance Benchmarking
Performs excellently in multiple visual-language benchmarks, with an average score of 63.5.
Long-Text Generation
Supports generating up to 2048 new tokens, suitable for tasks requiring long-text output.

Model Capabilities

Image Understanding
Text Generation
Multimodal Reasoning
Visual Question Answering

Use Cases

Education
Visual Math Problem Solving
Solving math problems containing diagrams and formulas
Achieved a score of 54.8 on the MathVista benchmark
General AI Assistant
Multimodal Dialogue
Intelligent dialogue based on image and text input
Achieved a score of 75.0 on the MMBench benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase