Model Selection

Joint Image-Text Reasoning

# Joint Image-Text Reasoning

Llama 4 Scout 17B 16E Linearized Bnb Nf4 Bf16

Llama 4 Scout is a 17-billion-parameter Mixture of Experts (MoE) model released by Meta, supporting multilingual text and image understanding with a linearized expert module design for PEFT/LoRA compatibility.

Multimodal Fusion

Transformers Supports Multiple Languages

Llama 4 Scout 17B 16E Unsloth Bnb 8bit

Llama 4 Scout is a multimodal large language model developed by Meta, utilizing a mixture of experts architecture that supports both text and image understanding, with a parameter scale of 17 billion (activated) / 109 billion (total).

Transformers Supports Multiple Languages

Llama 4 Scout 17B 16E Instruct

Llama 4 Scout is a multimodal AI model developed by Meta, featuring a mixture-of-experts architecture, supporting text and image interactions in 12 languages, with 17B active parameters and 109B total parameters.

Multimodal Fusion

Transformers Supports Multiple Languages

Turkish LLaVA V0.1

A Turkish visual-language model specifically designed for multimodal visual instruction-following tasks, capable of processing both visual (image) and text inputs to understand and execute instructions provided in Turkish.

Image-to-Text Other

AA Chameleon 7b Base

A multimodal model supporting interleaved text-image input/output, based on Chameleon 7B model with enhanced image generation capabilities through the Align-Anything framework

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase