L

Llama 3.2 11B Vision Invoices Mini

Developed by atulSethi
A multimodal large language model fine-tuned based on unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit, supporting visual instruction understanding tasks, with Unsloth optimization doubling training speed.
Downloads 46
Release Time : 3/10/2025

Model Overview

This is a multimodal large language model that supports both visual and text instructions, suitable for multimodal understanding and generation tasks.

Model Features

Efficient Training Optimization
Training with Unsloth and Huggingface TRL library, achieving 2x speed improvement
Multimodal Capability
Supports understanding and generation of both visual and text instructions
Quantization Compression
Utilizes 4bit quantization technology to reduce model storage and computational requirements

Model Capabilities

Text generation
Visual instruction understanding
Multimodal reasoning
Instruction following

Use Cases

Multimodal Interaction
Visual Question Answering
Answer questions based on image content
Image Caption Generation
Generate natural language descriptions for input images
Content Generation
Multimodal Content Creation
Generate creative content combining visual and text inputs
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase