# Joint Image-Text Reasoning

Llama 4 Scout 17B 16E Linearized Bnb Nf4 Bf16
Other
Llama 4 Scout is a 17-billion-parameter Mixture of Experts (MoE) model released by Meta, supporting multilingual text and image understanding with a linearized expert module design for PEFT/LoRA compatibility.
Multimodal Fusion Transformers Supports Multiple Languages
L
axolotl-quants
6,861
3
Llama 4 Scout 17B 16E Unsloth Bnb 8bit
Other
Llama 4 Scout is a multimodal large language model developed by Meta, utilizing a mixture of experts architecture that supports both text and image understanding, with a parameter scale of 17 billion (activated) / 109 billion (total).
Text-to-Image Transformers Supports Multiple Languages
L
unsloth
855
2
Llama 4 Scout 17B 16E Instruct
Other
Llama 4 Scout is a multimodal AI model developed by Meta, featuring a mixture-of-experts architecture, supporting text and image interactions in 12 languages, with 17B active parameters and 109B total parameters.
Multimodal Fusion Transformers Supports Multiple Languages
L
meta-llama
817.62k
844
Turkish LLaVA V0.1
MIT
A Turkish visual-language model specifically designed for multimodal visual instruction-following tasks, capable of processing both visual (image) and text inputs to understand and execute instructions provided in Turkish.
Image-to-Text Other
T
ytu-ce-cosmos
86
10
AA Chameleon 7b Base
A multimodal model supporting interleaved text-image input/output, based on Chameleon 7B model with enhanced image generation capabilities through the Align-Anything framework
Text-to-Image Transformers English
A
PKU-Alignment
105
8
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase