BGE_M3_Mindspore-GGUF Open-source Model - Multiple Quantization Options to Meet Different Needs

BGE M3 Mindspore GGUF

Developed by mradermacher

GGUF quantized version of BGE_M3_Mindspore, offering multiple quantization options to suit different needs.

Large Language Model English#Multilingual Embedding #Static Quantization #Efficient Inference

Downloads 49

Release Time : 9/23/2024

Model Overview

This is a GGUF quantized version based on the PhilipGAQ/BGE_M3_Mindspore model, supporting multiple quantization levels for various hardware environments and performance requirements.

Model Features

Multiple Quantization Options

Offers quantization levels from Q2_K to f16 to meet different performance and precision needs.

Efficient Inference

Quantized versions significantly reduce model size and improve inference speed, making them suitable for resource-limited environments.

Compatibility

GGUF format is compatible with various inference tools and frameworks, facilitating deployment and usage.

Model Capabilities

Text Embedding

Efficient Inference

Quantized Model Deployment

Use Cases

Natural Language Processing

Text Similarity Calculation

Use the quantized model to quickly compute text similarity.

Significantly improves computation speed while maintaining high accuracy.

Information Retrieval

Deploy lightweight embedding models for document retrieval systems.

Reduces resource consumption while maintaining retrieval quality.

🚀 BGE_M3_Mindspore Quantized Model

This project provides static quantizations of the BGE_M3_Mindspore model from Hugging Face.

🚀 Quick Start

If you're new to using this model, the following sections will guide you through its details, usage, and available quantizations.

✨ Features

Static quantization of the BGE_M3_Mindspore model.
A variety of quantization types are provided for different use cases.

📚 Documentation

About

This is a static quantization of the model from https://huggingface.co/PhilipGAQ/BGE_M3_Mindspore. At present, weighted/imatrix quants seem not to be available (by me). If they do not show up about a week after the static ones, I probably have not planned for them. You can request them by opening a Community Discussion.

Usage

If you're unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi - part files.

Provided Quants

The provided quantizations are sorted by size, not necessarily quality. IQ - quants are often preferable over similar sized non - IQ quants.

Link	Type	Size/GB	Notes
GGUF	Q2_K	0.5
GGUF	IQ3_XS	0.5
GGUF	IQ3_S	0.5	beats Q3_K*
GGUF	Q3_K_S	0.5
GGUF	IQ3_M	0.5
GGUF	Q3_K_M	0.5	lower quality
GGUF	IQ4_XS	0.5
GGUF	Q3_K_L	0.5
GGUF	Q4_K_S	0.5	fast, recommended
GGUF	Q4_K_M	0.5	fast, recommended
GGUF	Q5_K_S	0.6
GGUF	Q5_K_M	0.6
GGUF	Q6_K	0.6	very good quality
GGUF	Q8_0	0.7	fast, best quality
GGUF	f16	1.3	16 bpw, overkill

Here is a handy graph by ikawrakow comparing some lower - quality quant types (lower is better):

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご