Llama 3.2 11B Vision Invoices Mini
L
Llama 3.2 11B Vision Invoices Mini
Developed by atulSethi
A multimodal large language model fine-tuned based on unsloth/llama-3.2-11b-vision-instruct-unsloth-bnb-4bit, supporting visual instruction understanding tasks, with Unsloth optimization doubling training speed.
Downloads 46
Release Time : 3/10/2025
Model Overview
This is a multimodal large language model that supports both visual and text instructions, suitable for multimodal understanding and generation tasks.
Model Features
Efficient Training Optimization
Training with Unsloth and Huggingface TRL library, achieving 2x speed improvement
Multimodal Capability
Supports understanding and generation of both visual and text instructions
Quantization Compression
Utilizes 4bit quantization technology to reduce model storage and computational requirements
Model Capabilities
Text generation
Visual instruction understanding
Multimodal reasoning
Instruction following
Use Cases
Multimodal Interaction
Visual Question Answering
Answer questions based on image content
Image Caption Generation
Generate natural language descriptions for input images
Content Generation
Multimodal Content Creation
Generate creative content combining visual and text inputs
Featured Recommended AI Models
Š 2025AIbase