đ gte-large-gguf
This repository provides GGUF format files for the gte-large embedding model, enabling applications in text embeddings downstream tasks.
đ Quick Start
This repo offers GGUF format files for the gte-large
embedding model. You can use these files with llama.cpp
or LM Studio
to compute text embeddings.
⨠Features
- Wide Application: Trained on a large - scale corpus of relevance text pairs, suitable for various text embedding downstream tasks such as information retrieval, semantic textual similarity, and text reranking.
- Multiple Quantization Options: Provides multiple quantization methods, allowing users to choose according to their own needs and hardware conditions.
- High Compatibility: Compatible with
llama.cpp
and LM Studio
.
đĻ Installation
Prerequisites
- Install llama.cpp as of commit 4524290e8.
- For
LM Studio
, download version 0.2.19 from [here](https://releases.lmstudio.ai/windows/0.2.19/beta/LM - Studio-0.2.19-Setup-Preview-1.exe) (Windows), here (MacOS), or here (Linux).
đģ Usage Examples
Basic Usage with llama.cpp
To compute a single embedding, build llama.cpp
and run:
./embedding -ngl 99 -m [filepath-to-gguf].gguf -p 'search_query: What is TSNE?'
You can also submit a batch of texts to embed:
texts.txt
:
search_query: What is TSNE?
search_query: Who is Laurens Van der Maaten?
Compute multiple embeddings:
./embedding -ngl 99 -m [filepath-to-gguf].gguf -f texts.txt
Usage with LM Studio
- Download the 0.2.19 beta build from the links provided above.
- Open the app. Search for "ChristianAzinn" in the main search bar or in the "Search" tab on the left menu.
- Select your model and the desired quantization method. It is recommended to choose Q8_0 for this model.
- Wait for the model to download.
- Navigate to the "Local Server" tab on the left menu and open the loader for text embedding models.
- Select the downloaded model from the dropdown. Adjust configurations if necessary.
- Click the "Start Server" button.
Example curl
request to the API endpoint:
curl http://localhost:1234/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "Your text string goes here",
"model": "model-identifier-here"
}'
đ Documentation
Original Model
The GTE models are trained by Alibaba DAMO Academy on a large - scale corpus of relevance text pairs, covering a wide range of domains and scenarios.
This Repo
This repo contains GGUF format files for the gte-large
embedding model. These files were converted and quantized with llama.cpp
PR 5500, commit 34aa045de, on a consumer RTX 4090. This model supports up to 512 tokens of context.
đ§ Technical Details
Compatibility
These files are compatible with llama.cpp as of commit 4524290e8, as well as LM Studio as of version 0.2.19.
Explanation of Quantisation Methods
Click to see details
The methods available are:
* GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
* GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
* GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
* GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
* GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
Refer to the Provided Files table below to see what files use which methods, and how.
Provided Files
Property |
Details |
Model Type |
BERT |
Training Data |
Large - scale corpus of relevance text pairs |
Name |
Quant method |
Bits |
Size |
Use case |
gte-large.Q2_K.gguf |
Q2_K |
2 |
144 MB |
smallest, significant quality loss - not recommended for most purposes |
gte-large.Q3_K_S.gguf |
Q3_K_S |
3 |
160 MB |
very small, high quality loss |
gte-large.Q3_K_M.gguf |
Q3_K_M |
3 |
181 mB |
very small, high quality loss |
gte-large.Q3_K_L.gguf |
Q3_K_L |
3 |
198 MB |
small, substantial quality loss |
gte-large.Q4_0.gguf |
Q4_0 |
4 |
200 MB |
legacy; small, very high quality loss - prefer using Q3_K_M |
gte-large.Q4_K_S.gguf |
Q4_K_S |
4 |
203 MB |
small, greater quality loss |
gte-large.Q4_K_M.gguf |
Q4_K_M |
4 |
216 MB |
medium, balanced quality - recommended |
gte-large.Q5_0.gguf |
Q5_0 |
5 |
237 MB |
legacy; medium, balanced quality - prefer using Q4_K_M |
gte-large.Q5_K_S.gguf |
Q5_K_S |
5 |
237 MB |
large, low quality loss - recommended |
gte-large.Q5_K_M.gguf |
Q5_K_M |
5 |
246 MB |
large, very low quality loss - recommended |
gte-large.Q6_K.gguf |
Q6_K |
6 |
278 MB |
very large, extremely low quality loss |
gte-large.Q8_0.gguf |
Q8_0 |
8 |
358 MB |
very large, extremely low quality loss - recommended |
gte-large.Q8_0.gguf |
FP16 |
16 |
670 MB |
enormous, pretty much the original model - not recommended |
gte-large.Q8_0.gguf |
FP32 |
32 |
1.34 GB |
enormous, pretty much the original model - not recommended |
đ License
This project is licensed under the MIT license.
Acknowledgements
Thanks to the LM Studio team and everyone else working on open - source AI.
This README is inspired by that of nomic-ai-embed-text-v1.5-gguf, another excellent embedding model, and those of the legendary TheBloke.