L

Llama Joycaption Beta One Hf Llava GGUF

Developed by Mungert
An image captioning vision-language model (VLM) freely open to the community, which can be used to train diffusion models and supports diverse image styles and content.
Downloads 2,968
Release Time : 6/8/2025

Model Overview

This model is a vision-language model based on Llama-3.1-8B-Instruct and SigLIP2, focusing on generating high-quality and diverse image captions, suitable for various image styles and content.

Model Features

Free and open source
The model weights are open, with no usage restrictions, and come with training scripts and detailed construction information.
Uncensored
It has a balanced coverage of safe for work (SFW) and not safe for work (NSFW) content, without euphemistic expressions.
Diversity
It covers various image styles, content, races, genders, orientations, etc., suitable for all users.
Minimal filtering
It is trained on a large number of images and can understand all aspects of the real world, but never contains illegal content.

Model Capabilities

Image caption generation
Vision-language understanding
Diverse content generation

Use Cases

Image caption generation
Generate formal descriptive captions
Generate detailed and formal caption descriptions for images.
High-quality and diverse caption output.
Train diffusion models
Used to train diffusion models to generate more accurate image descriptions.
Improve the generation quality of diffusion models.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase