Snowflake Arctic Embed L V2.0 Gguf

Developed by Casual-Autopsy

Snowflake Arctic-embed-l-v2.0 is the latest embedding model released by Snowflake, specifically designed for multilingual workloads, optimizing retrieval performance and inference efficiency.

Text Embedding Supports Multiple Languages#Multilingual Retrieval #Efficient Inference #Long Text Support

Downloads 4,066

Release Time : 2/6/2025

Model Overview

Arctic Embed 2.0 sets a new standard for multilingual embedding models, achieving high-quality multilingual text retrieval without sacrificing English performance.

Model Features

Uncompromising Multilingual Support

Excels in both English and non-English retrieval, outperforming leading open-source and proprietary models in benchmarks such as MTEB Retrieval, CLEF, and MIRACL.

Inference Efficiency

Its 303M non-embedding parameters ensure fast inference speeds, suitable for efficiency needs at any scale.

Compression-Friendly

Achieves high-quality retrieval with embeddings as small as 128 bytes/vector through Matryoshka Representation Learning (MRL) and quantization-aware embedding training.

Direct Replacement

Based on BAAI/bge-m3-retromae, it can directly replace any form of new libraries, kernels, inference engines, etc.

Long Context Support

Supports context windows up to 8192 via RoPE.

Model Capabilities

Multilingual Text Retrieval

Sentence Similarity Calculation

Efficient Inference

High-Quality Embeddings

Use Cases

Information Retrieval

Enterprise Multilingual Search

Ideal for applications requiring large-scale, reliable, enterprise-grade multilingual search and retrieval.

Performs excellently in benchmarks such as MTEB Retrieval, CLEF, and MIRACL.

Natural Language Processing

Multilingual Text Similarity Calculation

Used to calculate the similarity between texts in different languages.

Supports text similarity calculation in multiple languages.

🚀 Snowflake's Arctic-embed-l-v2.0

Snowflake's Arctic-embed-l-v2.0 is a state-of-the-art multilingual embedding model. It offers high - quality retrieval across English and multiple languages, with excellent inference efficiency and compression capabilities. It's suitable for large - scale enterprise - grade multilingual search and retrieval applications.

Model Information

Property	Details
Base Model	Snowflake/snowflake-arctic-embed-l-v2.0
Pipeline Tag	sentence - similarity
Tags	xlm - roberta, mteb, arctic, snowflake - arctic - embed, text - embeddings - inference
Library Name	sentence - transformers
Supported Languages	af, ar, az, be, bg, bn, ca, ceb, cs, cy, da, de, el, en, es, et, eu, fa, fi, fr, gl, gu, he, hi, hr, ht, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ky, lo, lt, lv, mk, ml, mn, mr, ms, my, ne, nl, pa, pl, pt, qu, ro, ru, si, sk, sl, so, sq, sr, sv, sw, ta, te, th, tl, tr, uk, ur, vi, yo, zh

GGUF quants of Snowflake/snowflake-arctic-embed-l-v2.0 created using llama.cpp

Original model card:

Snowflake's Arctic-embed-l-v2.0

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

🚀 Quick Start

Snowflake arctic - embed - l - v2.0 is designed for multilingual retrieval tasks. You can quickly start using it by following the usage examples below.

✨ Features

Multilingual without compromise: Excels in English and non - English retrieval, outperforming leading open - source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
Inference efficiency: Its 303m non - embedding parameters inference is fast and efficient for any scale.
Compression - friendly: Achieves high - quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization - aware embedding training.
Drop - In Replacement: arctic - embed - l - v2.0 builds on [BAAI/bge - m3 - retromae](https://huggingface.co/BAAI/bge - m3 - retromae) which allows direct drop - in inference replacement with any form of new libraries, kernels, inference engines etc.
Long Context Support: arctic - embed - l - v2.0 builds on [BAAI/bge - m3 - retromae](https://huggingface.co/BAAI/bge - m3 - retromae) which can support a context window of up to 8192 via the use of RoPE.

Quality Benchmarks

Unlike most other open - source models, Arctic - embed - l - v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF). All numbers mentioned below are the average NDCG@10 across the dataset being discussed.

Model Name	# params	# non - emb params	# dimensions	BEIR (15)	MIRACL (4)	CLEF (Focused)	CLEF (Full)
snowflake - arctic - l - v2.0	568M	303M	1024	55.6	55.8	52.9	54.3
snowflake - arctic - m	109M	86M	768	54.9	24.9	34.4	29.1
snowflake - arctic - l	335M	303M	1024	56.0	34.8	38.2	33.7
me5 base	560M	303M	1024	51.4	54.0	43.0	34.6
bge - m3 (BAAI)	568M	303M	1024	48.8	56.8	40.8	41.3
gte (Alibaba)	305M	113M	768	51.1	52.3	47.7	53.1

It also delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degradation in quality. Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.

Model		BEIR (15)	Relative Performance	MIRACL (4)	Relative Performance	CLEF (5)	Relative Performance	CLEF (Full)	Relative Performance
snowflake - arctic - l - v2.0	1024	55.6	N/A	55.8	N/A	52.9	N/A	54.3	N/A
snowflake - arctic - l - v2.0	256	54.3	-0.18%	54.3	-2.70%	51.9	-1.81%	53.4	-1.53%

💻 Usage Examples

Basic Usage

Using Sentence Transformers

from sentence_transformers import SentenceTransformer

# Load the model
model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
model = SentenceTransformer(model_name)

# Define the queries and documents
queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

# Compute embeddings: use `prompt_name="query"` to encode queries!
query_embeddings = model.encode(queries, prompt_name="query") 
document_embeddings = model.encode(documents)

# Compute cosine similarity scores
scores = model.similarity(query_embeddings, document_embeddings)

# Output the results
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

Using Huggingface Transformers

You can use the transformers package to use Snowflake's arctic - embed model. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).

import torch
from transformers import AutoModel, AutoTokenizer

model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
model.eval()

query_prefix = 'query: '
queries  = ['what is snowflake?', 'Where can I get the best tacos?']
queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192)

documents = ['The Data Cloud!', 'Mexico City of Course!']
document_tokens =  tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192)

# Compute token embeddings
with torch.no_grad():
    query_embeddings = model(**query_tokens)[0][:, 0]
    document_embeddings = model(**document_tokens)[0][:, 0]


# normalize embeddings
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1)

scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1))
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    #Output passages & scores
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

This should produce the following scores

Query: what is snowflake?
tensor(0.2715) The Data Cloud!
tensor(0.0661) Mexico City of Course!
Query: Where can I get the best tacos?
tensor(0.2797) Mexico City of Course!
tensor(0.1250) The Data Cloud!

Using Huggingface Transformers.js

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

You can then use the model for retrieval, as follows:

import { pipeline, dot } from '@huggingface/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', {
    dtype: 'q8',
});

// Generate sentence embeddings
const sentences = [
    'query: what is snowflake?',
    'The Data Cloud!',
    'Mexico City of Course!',
]
const output = await extractor(sentences, { normalize: true, pooling: 'cls' });

// Compute similarity scores
const [source_embeddings, ...document_embeddings ] = output.tolist();
const similarities = document_embeddings.map(x => dot(source_embeddings, x));
console.log(similarities); // [0.24783534471401417, 0.05313122704326892]

📄 License

Arctic is licensed under the Apache 2. The released models can be used for commercial purposes free of charge.

📞 Contact

Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご