M

Molmo 7B D 0924 NF4

Developed by Scoolar
The 4Bit quantized version of Molmo-7B-D-0924, which reduces VRAM usage through the NF4 quantization strategy and is suitable for environments with limited VRAM.
Downloads 1,259
Release Time : 1/31/2025

Model Overview

This model is a 4Bit quantized version of Molmo-7B-D-0924, adopting the NF4 quantization strategy. While reducing the model size and VRAM usage, it ensures the model performance as much as possible and is suitable for scenarios with high VRAM requirements.

Model Features

NF4 Quantization Strategy
Adopt NF4 quantization and retain FP16 in key modules to avoid significant performance degradation.
VRAM Optimization
The model occupies about 7GB of VRAM when loading and up to about 10GB during inference (with 4K image input), which is significantly reduced compared to the original model.
Fast Loading Speed
The model loading speed is significantly faster than the original model, suitable for serverless hosting.
Good Adaptability
It can run on a GPU with 12GB of VRAM and allows batch processing on a T4 (16GB).

Model Capabilities

Image Caption Generation
Vision-Language Understanding
Multimodal Inference

Use Cases

Image Understanding
Image Caption Generation
Generate natural language descriptions based on the input image.
Generate smooth and accurate image captions.
Serverless Hosting
Deployment in Low VRAM Environment
Deploy a vision-language model in an environment with limited VRAM.
Successfully run on a 12GB GPU.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase