Qwen3 Embedding 8B Auto

Q

Qwen3 Embedding 8B Auto

由 michaelfeil 开发

Qwen3 Embedding 模型系列是通义家族的最新自研模型，专为文本嵌入和排序任务设计，支持100多种语言，在MTEB多语言排行榜上排名第一。

文本嵌入开源协议:Apache-2.0 #多语言文本嵌入 #长上下文理解 #高维向量检索

下载量 135

发布时间 : 6/6/2025

模型简介

基于通义千问3系列的稠密基础模型，提供文本嵌入和重排序功能，适用于文本检索、代码检索、文本分类、文本聚类和双语挖掘等多种任务。

模型特点

卓越的通用性

在MTEB多语言排行榜上排名第一（得分70.58），在各种文本嵌入和排序任务中表现出色。

全面的灵活性

提供从0.6B到8B的全范围规模，支持用户自定义指令和灵活的向量维度定义。

多语言能力

支持100多种语言，包括各种编程语言，提供强大的多语言、跨语言和代码检索能力。

长文本理解

支持32k的上下文长度，适合处理长文本任务。

模型能力

文本检索

代码检索

文本分类

文本聚类

双语挖掘

多语言文本处理

使用案例

信息检索

网页搜索

根据查询检索相关段落

检索性能提升1%到5%（使用自定义指令时）

知识管理

文档聚类

对大量文档进行自动分类和聚类

在MTEB聚类任务中得分57.65（8B模型）

跨语言应用

双语挖掘

跨语言文本匹配和检索

在MTEB双语挖掘任务中得分80.89（8B模型）

🚀 Qwen3-Embedding-8B

Qwen3 Embedding 模型系列是通义家族的最新自研模型，专为文本嵌入和排序任务设计。该系列基于通义千问 3 系列的稠密基础模型，提供了多种不同规模（0.6B、4B 和 8B）的文本嵌入和重排序模型。它继承了基础模型出色的多语言能力、长文本理解和推理能力，在文本检索、代码检索、文本分类、文本聚类和双语挖掘等多个文本嵌入和排序任务中取得了显著进展。

🚀 快速开始

Qwen3 Embedding 模型系列是通义家族的最新自研模型，专为文本嵌入和排序任务设计。它提供了多种不同规模的模型，继承了基础模型的多语言能力、长文本理解和推理能力，在多个文本嵌入和排序任务中表现出色。

✨ 主要特性

卓越的通用性

嵌入模型在广泛的下游应用评估中达到了最先进的性能。8B 规模的嵌入模型在 MTEB 多语言排行榜上排名第一（截至 2025 年 6 月 5 日，得分70.58），而重排序模型在各种文本检索场景中表现出色。

全面的灵活性

Qwen3 Embedding 系列为嵌入和重排序模型提供了全范围的规模（从 0.6B 到 8B），满足了对效率和效果有不同优先级的各种用例。开发人员可以无缝组合这两个模块。此外，嵌入模型允许在所有维度上灵活定义向量，并且嵌入和重排序模型都支持用户定义的指令，以提高特定任务、语言或场景的性能。

多语言能力

由于通义千问 3 模型的多语言能力，Qwen3 Embedding 系列支持 100 多种语言，包括各种编程语言，并提供强大的多语言、跨语言和代码检索能力。

Qwen3-Embedding-8B 的特性

属性	详情
模型类型	文本嵌入
支持语言	100 多种语言
参数数量	8B
上下文长度	32k
嵌入维度	最高 4096，支持用户定义 32 到 4096 的输出维度

更多详细信息，包括基准评估、硬件要求和推理性能，请参考我们的博客、GitHub。

📦 Qwen3 Embedding 系列模型列表

模型类型	模型	规模	层数	序列长度	嵌入维度	是否支持 MRL	是否支持自定义指令
文本嵌入	Qwen3-Embedding-0.6B	0.6B	28	32K	1024	是	是
文本嵌入	Qwen3-Embedding-4B	4B	36	32K	2560	是	是
文本嵌入	Qwen3-Embedding-8B	8B	36	32K	4096	是	是
文本重排序	Qwen3-Reranker-0.6B	0.6B	28	32K	-	-	是
文本重排序	Qwen3-Reranker-4B	4B	36	32K	-	-	是
文本重排序	Qwen3-Reranker-8B	8B	36	32K	-	-	是

⚠️ 重要提示

是否支持 MRL 表示嵌入模型是否支持最终嵌入的自定义维度。

是否支持自定义指令 表示嵌入或重排序模型是否支持根据不同任务自定义输入指令。

我们的评估表明，对于大多数下游任务，使用指令（instruct）通常比不使用指令的性能提高 1% 到 5%。因此，我们建议开发人员根据自己的任务和场景创建定制的指令。在多语言环境中，我们也建议用户用英语编写指令，因为模型训练过程中使用的大多数指令最初都是用英语编写的。

💻 使用示例

基础用法

Sentence Transformers 使用示例

# Requires transformers>=4.51.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-8B")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-8B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7493, 0.0751],
#         [0.0880, 0.6318]])

Transformers 使用示例

# Requires transformers>=4.51.0

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-8B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B')

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7493016123771667, 0.0750647559762001], [0.08795969933271408, 0.6318399906158447]]

💡 使用建议

我们建议开发人员根据具体场景、任务和语言自定义 instruct。我们的测试表明，在大多数检索场景中，查询端不使用 instruct 会导致检索性能下降约 1% 到 5%。

📚 评估

MTEB（多语言）

模型	规模	平均（任务）	平均（类型）	双语挖掘	分类	聚类	指令检索	多分类	配对分类	重排序	检索	STS
NV-Embed-v2	7B	56.29	49.58	57.84	57.29	40.80	1.04	18.63	78.94	63.82	56.72	71.10
GritLM-7B	7B	60.92	53.74	70.53	61.83	49.75	3.45	22.77	79.94	63.78	58.31	73.33
BGE-M3	0.6B	59.56	52.18	79.11	60.35	40.88	-3.11	20.1	80.76	62.79	54.60	74.12
multilingual-e5-large-instruct	0.6B	63.22	55.08	80.13	64.94	50.75	-0.40	22.91	80.86	62.61	57.12	76.81
gte-Qwen2-1.5B-instruct	1.5B	59.45	52.69	62.51	58.32	52.05	0.74	24.02	81.58	62.58	60.78	71.61
gte-Qwen2-7b-Instruct	7B	62.51	55.93	73.92	61.55	52.77	4.94	25.48	85.13	65.55	60.08	73.98
text-embedding-3-large	-	58.93	51.41	62.17	60.27	46.89	-2.68	22.03	79.17	63.89	59.27	71.68
Cohere-embed-multilingual-v3.0	-	61.12	53.23	70.50	62.95	46.89	-1.89	22.74	79.88	64.07	59.16	74.80
gemini-embedding-exp-03-07	-	68.37	59.59	79.28	71.82	54.59	5.18	29.16	83.63	65.58	67.71	79.40
Qwen3-Embedding-0.6B	0.6B	64.33	56.00	72.22	66.83	52.33	5.09	24.59	80.83	61.41	64.64	76.17
Qwen3-Embedding-4B	4B	69.45	60.86	79.36	72.33	57.15	11.56	26.77	85.05	65.08	69.60	80.86
Qwen3-Embedding-8B	8B	70.58	61.69	80.89	74.00	57.65	10.06	28.66	86.40	65.63	70.88	81.08

⚠️ 重要提示

对于对比模型，分数是从 2025 年 5 月 24 日的 MTEB 在线排行榜获取的。

MTEB（英语 v2）

MTEB 英语 / 模型	参数	平均（任务）	平均（类型）	分类	聚类	配对分类	重排序	检索	STS	摘要
multilingual-e5-large-instruct	0.6B	65.53	61.21	75.54	49.89	86.24	48.74	53.47	84.72	29.89
NV-Embed-v2	7.8B	69.81	65.00	87.19	47.66	88.69	49.61	62.84	83.82	35.21
GritLM-7B	7.2B	67.07	63.22	81.25	50.82	87.29	49.59	54.95	83.03	35.65
gte-Qwen2-1.5B-instruct	1.5B	67.20	63.26	85.84	53.54	87.52	49.25	50.25	82.51	33.94
stella_en_1.5B_v5	1.5B	69.43	65.32	89.38	57.06	88.02	50.19	52.42	83.27	36.91
gte-Qwen2-7B-instruct	7.6B	70.72	65.77	88.52	58.97	85.9	50.47	58.09	82.69	35.74
gemini-embedding-exp-03-07	-	73.3	67.67	90.05	59.39	87.7	48.59	64.35	85.29	38.28
Qwen3-Embedding-0.6B	0.6B	70.70	64.88	85.76	54.05	84.37	48.18	61.83	86.57	33.43
Qwen3-Embedding-4B	4B	74.60	68.10	89.84	57.51	87.01	50.76	68.46	88.72	34.39
Qwen3-Embedding-8B	8B	75.22	68.71	90.43	58.57	87.52	51.56	69.44	88.58	34.83

C-MTEB（MTEB 中文）

C-MTEB	参数	平均（任务）	平均（类型）	分类	聚类	配对分类	重排序	检索	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	68.52	75.31	59.30	86.67	68.28	73.73	55.19
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-0.6B	0.6B	66.33	67.45	71.40	68.74	76.42	62.58	71.03	54.52
Qwen3-Embedding-4B	4B	72.27	73.51	75.46	77.89	83.34	66.05	77.03	61.26
Qwen3-Embedding-8B	8B	73.84	75.00	76.97	80.08	84.23	66.99	78.21	63.53

📄 许可证

本项目采用 Apache-2.0 许可证。

📖 引用

如果您觉得我们的工作有帮助，请引用：

@misc{qwen3-embedding,
    title  = {Qwen3-Embedding},
    url    = {https://qwenlm.github.io/blog/qwen3/},
    author = {Qwen Team},
    month  = {May},
    year   = {2025}
}

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

大型语言模型

Transformers 支持多种语言

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Transformers 英语

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

问答系统中文

AIbase

智启未来，您的人工智能解决方案智库

English 简体中文繁體中文にほんご

© 2025AIbase