A

Aya Vision 32b

Developed by CohereLabs
Aya Vision 32B is an open-weight 32B parameter multimodal model developed by Cohere Labs, supporting vision-language tasks in 23 languages.
Downloads 387
Release Time : 3/2/2025

Model Overview

A multilingual model optimized for various vision-language tasks, including OCR, image captioning, visual reasoning, summarization, Q&A, code generation, etc.

Model Features

Multilingual support
Supports vision-language task processing in 23 languages
High-resolution image processing
Supports 364x364 pixel resolution with up to 2197 image tokens
Long context support
16K context length suitable for complex tasks
Multimodal adapter
Innovative architecture combining advanced text models with visual encoders

Model Capabilities

Image caption generation
Visual question answering
Multilingual OCR
Visual reasoning
Text summarization
Code generation
Cross-modal understanding

Use Cases

Content understanding
Multilingual image captioning
Generate descriptive text for images in different languages
Accurate descriptions in 23 languages
Document OCR
Extract multilingual text content from images
High-precision text recognition
Intelligent interaction
Visual question answering
Answer complex questions about image content
Supports multilingual Q&A
Educational assistance
Explain educational content in images
Multilingual teaching support
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase