Colbertv2.0-GGUF Open Source Model - A static quantization version tool offering multiple quantization options

Colbertv2.0 GGUF

Developed by mradermacher

Static quantized version of ColBERTv2.0, based on lightonai/colbertv2.0 model, offering multiple quantization options

Text Embedding EnglishOpen Source License:MIT #Sentence similarity calculation #Efficient retrieval optimization #Low-resource quantization deployment

Downloads 172

Release Time : 3/3/2025

Model Overview

This is a sentence transformer model based on the ColBERT architecture, primarily used for sentence similarity calculation and feature extraction tasks.

Model Features

Multiple Quantization Options

Provides 11 different quantization levels from Q2_K to Q8_0

Efficient Inference

Quantized model files range between 0.2-0.3GB, suitable for resource-constrained environments

High-Quality Quantization

Includes new quantization methods like IQ4_XS and high-quality options like Q6_K

Model Capabilities

Sentence vector representation

Semantic similarity calculation

Text feature extraction

Use Cases

Information Retrieval

Document Retrieval

Achieves efficient retrieval by calculating semantic similarity between queries and documents

Question Answering Systems

Answer Passage Matching

Matches questions with candidate answer passages in QA systems

🚀 Colbertv2.0 Quantized Model

This project provides static quantizations of the ColBERTv2.0 model, offering various quantization types for different use cases.

🚀 Quick Start

If you are new to using this quantized model, the following sections will guide you through its details and usage.

✨ Features

Multiple Quantization Types: Offers a range of quantization types such as Q2_K, Q3_K_S, IQ4_XS, etc., sorted by size.
Sentence Transformer: Suitable for tasks like sentence similarity and feature extraction.
Based on ColBERT: Built upon the ColBERT architecture.

📦 Installation

No specific installation steps are provided in the original README. If you need to use the GGUF files, refer to TheBloke's READMEs for more details, including how to concatenate multi - part files.

📚 Documentation

About

This is a static quantization of https://huggingface.co/lightonai/colbertv2.0. Weighted/imatrix quants seem not to be available (by me) at this time. If they do not show up a week or so after the static ones, I have probably not planned for them. Feel free to request them by opening a Community Discussion.

Usage

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi - part files.

Provided Quants

The provided quantizations are sorted by size, not necessarily quality. IQ - quants are often preferable over similar sized non - IQ quants.

Link	Type	Size/GB	Notes
GGUF	Q2_K	0.2
GGUF	Q3_K_S	0.2
GGUF	Q3_K_M	0.2	lower quality
GGUF	IQ4_XS	0.2
GGUF	Q3_K_L	0.2
GGUF	Q4_K_S	0.2	fast, recommended
GGUF	Q4_K_M	0.2	fast, recommended
GGUF	Q5_K_S	0.2
GGUF	Q5_K_M	0.2
GGUF	Q6_K	0.2	very good quality
GGUF	Q8_0	0.2	fast, best quality
GGUF	f16	0.3	16 bpw, overkill

Here is a handy graph by ikawrakow comparing some lower - quality quant types (lower is better):

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご