EraX-VL-2B-V1.5-GGUF Open-Source Multimodal Model - Supports Multilingual Images and Image-Text to Text Conversion

Erax VL 2B V1.5 GGUF

Developed by mradermacher

EraX-VL-2B-V1.5 is a multimodal model supporting Vietnamese, English, and Chinese, with capabilities for image-to-text and image-text-to-text conversion.

Image-to-Text Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multimodal Image-to-Text #Vietnamese OCR Processing #Insurance Document Analysis

Downloads 95

Release Time : 12/29/2024

Model Overview

EraX-VL-2B-V1.5 is a multimodal model based on the transformers library, primarily used for image-to-text and image-text-to-text tasks, supporting Vietnamese, English, and Chinese.

Model Features

Multimodal Support

Supports joint processing of images and text, capable of converting image content into textual descriptions.

Multilingual Support

Supports processing in three languages: Vietnamese, English, and Chinese.

Diverse Quantization Versions

Offers multiple quantized versions suitable for different hardware and performance needs.

Model Capabilities

Image-to-text

Image-text-to-text

Multilingual Processing

Optical Character Recognition

Use Cases

Insurance

Insurance Document Processing

Automatically identifies and converts image and text content in insurance documents.

Optical Character Recognition

Document OCR

Converts images in scanned documents into editable text.

🚀 EraX-VL-2B-V1.5 Quantized Model

This project provides static quantizations of the EraX-VL-2B-V1.5 model, offering various quantized versions for different usage scenarios.

🚀 Quick Start

If you want to use the quantized model, please refer to the following content for detailed information.

✨ Features

Multilingual Support: Supports Vietnamese (vi), English (en), and Chinese (zh).
Multimodal Capabilities: Suitable for tasks such as image-to-text and image-text-to-text.
Quantized Variants: Offers a variety of quantized versions with different sizes and qualities.

📦 Installation

This README does not provide specific installation steps.

💻 Usage Examples

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

📚 Documentation

About

This is a static quantization of https://huggingface.co/erax-ai/EraX-VL-2B-V1.5. Weighted/imatrix quants are available at https://huggingface.co/mradermacher/EraX-VL-2B-V1.5-i1-GGUF.

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Link	Type	Size/GB	Notes
GGUF	Q2_K	0.8
GGUF	Q3_K_S	0.9
GGUF	Q3_K_M	0.9	lower quality
GGUF	Q3_K_L	1.0
GGUF	IQ4_XS	1.0
GGUF	Q4_K_S	1.0	fast, recommended
GGUF	Q4_K_M	1.1	fast, recommended
GGUF	Q5_K_S	1.2
GGUF	Q5_K_M	1.2
GGUF	Q6_K	1.4	very good quality
GGUF	mmproj-fp16	1.4	vision supplement
GGUF	Q8_0	1.7	fast, best quality
GGUF	f16	3.2	16 bpw, overkill

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

📄 License

This project is licensed under the apache-2.0 license.

👏 Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time. Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more imatrix quants, at much higher quality, than I would otherwise be able to.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご