Qwen3-Embedding-4B-GGUFオープンソーステキスト埋め込みモデル - 多言語処理とコード検索の良いヘルパー

ホーム

Qwen3 Embedding 4B GGUF

Mungertによって開発

Qwen3-Embedding-4BはQwen3シリーズに基づいて構築されたテキスト埋め込みモデルで、テキスト埋め込みとソートタスクに特化して設計され、多言語テキスト処理やコード検索などの分野で優れた性能を発揮します。

テキスト埋め込みオープンソースライセンス:Apache-2.0 #多言語テキスト埋め込み #高精度検索 #長文脈サポート

ダウンロード数 723

リリース時間 : 6/10/2025

モデル概要

このモデルは4Bパラメータのテキスト埋め込みモデルで、100種類以上の言語をサポートし、様々なプログラミング言語も含まれ、強力な多言語、クロス言語、コード検索能力を提供します。

モデル特徴

卓越した汎用性

幅広い下流アプリケーション評価で最先端の性能を達成し、MTEB多言語ランキングで優れた成績を収めています。

全面的な柔軟性

全範囲のモデルサイズ（0.6Bから8B）を提供し、さまざまな効率と効果の要件を満たし、ユーザー定義の命令とベクトル次元をサポートします。

多言語能力

100種類以上の言語をサポートし、様々なプログラミング言語も含まれ、強力な多言語、クロス言語、コード検索能力を提供します。

モデル能力

テキスト埋め込み

テキストソート

多言語処理

コード検索

使用事例

情報検索

検索エンジン最適化

検索エンジンの関連性ソートを向上させ、検索結果の精度を高めるために使用されます。

検索タスクで優れた性能を発揮し、検索クエリとドキュメントを効果的にマッチングできます。

多言語アプリケーション

クロス言語検索

多言語の検索クエリとドキュメントのマッチングをサポートし、国際化コンテンツプラットフォームに適しています。

多言語環境で高い検索性能を維持します。

🚀 Qwen3-Embedding-4B GGUFモデル

Qwen3-Embedding-4B GGUFモデルは、テキスト埋め込みとランキングタスクに特化した最新のモデルです。多言語対応や高性能な推論能力を備え、様々なダウンストリームタスクで優れた性能を発揮します。

🚀 クイックスタート

このモデルを使用するには、transformersバージョン4.51.0以上が必要です。以下に、sentence-transformersとtransformersを使用した基本的な使い方を示します。

💻 使用例

基本的な使用法

# Requires transformers>=4.51.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-4B")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-4B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7534, 0.1147],
#         [0.0320, 0.6258]])

高度な使用法

# Requires transformers>=4.51.0
import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-4B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B')

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-4B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7534257769584656, 0.1146894246339798], [0.03198453038930893, 0.6258305311203003]]

✨ 主な機能

卓越した汎用性：埋め込みモデルは、幅広いダウンストリームアプリケーション評価で最先端の性能を達成しています。8Bサイズの埋め込みモデルは、MTEB多言語リーダーボードで1位（2025年6月5日現在、スコア70.58）を獲得しています。
包括的な柔軟性：Qwen3 Embeddingシリーズは、埋め込みモデルと再ランキングモデルの両方に0.6Bから8Bまでの全スペクトルのサイズを提供し、効率と効果を重視するさまざまなユースケースに対応しています。
多言語対応：Qwen3モデルの多言語機能により、Qwen3 Embeddingシリーズは100以上の言語をサポートしています。これにはさまざまなプログラミング言語も含まれ、強力な多言語、クロス言語、コード検索機能を提供します。

📦 インストール

このモデルを使用するには、transformersバージョン4.51.0以上が必要です。以下のコマンドでインストールできます。

pip install transformers>=4.51.0

📚 ドキュメント

モデル生成の詳細

このモデルは、llama.cpp のコミット 1f63e75f を使用して生成されました。

IMatrixを超えた量子化

新しい量子化方法をテストしており、重要なレイヤーを標準のIMatrixが使用する以上に向上させるルールを使用しています。標準のIMatrixは、低ビット量子化やMOEモデルではあまり良い性能を発揮しないことがわかりました。そのため、llama.cppの --tensor-type を使用して選択したレイヤーを強化しています。詳細は Layer bumping with llama.cpp を参照してください。

適切なモデル形式の選択

正しいモデル形式の選択は、ハードウェア能力とメモリ制約に依存します。以下に、各モデル形式の特徴と推奨ユースケースを示します。

モデル形式	精度	メモリ使用量	デバイス要件	最適なユースケース
BF16	非常に高い	高い	BF16対応のGPU/CPU	メモリを削減した高速推論
F16	高い	高い	FP16対応のGPU/CPU	BF16が利用できない場合の推論
Q4_K	中～低	低い	CPUまたは低VRAMデバイス	メモリ制約のある推論
Q6_K	中	中程度	より多くのメモリを持つCPU	量子化によるより高い精度
Q8_0	高い	中程度	中程度のVRAMを持つGPU/CPU	量子化モデルの中で最も高い精度
IQ3_XS	低い	非常に低い	超低メモリデバイス	最大限のメモリ効率、低精度
IQ3_S	低い	非常に低い	低メモリデバイス	IQ3_XSよりも少し使いやすい
IQ3_M	低～中	低い	低メモリデバイス	IQ3_Sよりも高い精度
Q4_0	低い	低い	ARMベース/組み込みデバイス	Llama.cppがARM推論用に自動的に最適化
*Ultra Low-Bit (IQ1/2_)**	非常に低い	非常に低い	小型のエッジ/組み込みデバイス	非常に限られたメモリにモデルを収める場合
Hybrid (e.g., `bf16_q8_0`)	中～高	中程度	混合精度推論が可能なハードウェア	メモリ効率と精度のバランス、重要なレイヤーでのFPに近い精度

モデル概要

Qwen3-Embedding-4B は以下の特徴を持っています。

属性	詳情
モデルタイプ	テキスト埋め込み
サポート言語	100以上の言語
パラメータ数	4B
コンテキスト長	32k
埋め込み次元	最大2560、32から2560までのユーザー定義出力次元をサポート

詳細なベンチマーク評価、ハードウェア要件、推論性能については、ブログと GitHub を参照してください。

Qwen3 Embeddingシリーズモデルリスト

モデルタイプ	モデル	サイズ	レイヤー	シーケンス長	埋め込み次元	MRLサポート	命令対応
テキスト埋め込み	Qwen3-Embedding-0.6B	0.6B	28	32K	1024	はい	はい
テキスト埋め込み	Qwen3-Embedding-4B	4B	36	32K	2560	はい	はい
テキスト埋め込み	Qwen3-Embedding-8B	8B	36	32K	4096	はい	はい
テキスト再ランキング	Qwen3-Reranker-0.6B	0.6B	28	32K	-	-	はい
テキスト再ランキング	Qwen3-Reranker-4B	4B	36	32K	-	-	はい
テキスト再ランキング	Qwen3-Reranker-8B	8B	36	32K	-	-	はい

⚠️ 重要提示

MRLサポート は、埋め込みモデルが最終的な埋め込みのカスタム次元をサポートするかどうかを示します。

命令対応 は、埋め込みまたは再ランキングモデルが異なるタスクに応じて入力命令をカスタマイズできるかどうかを示します。

ほとんどのダウンストリームタスクでは、命令（instruct）を使用すると、使用しない場合に比べて1％から5％の改善が見られます。したがって、開発者は自分のタスクやシナリオに合わせたカスタム命令を作成することをお勧めします。多言語環境では、モデルのトレーニング中に使用されたほとんどの命令が英語で書かれているため、命令を英語で書くこともお勧めします。

評価

MTEB（多言語）

モデル	サイズ	平均（タスク）	平均（タイプ）	バイテキストマイニング	分類	クラスタリング	命令検索	多クラス分類	ペア分類	再ランキング	検索	STS
NV-Embed-v2	7B	56.29	49.58	57.84	57.29	40.80	1.04	18.63	78.94	63.82	56.72	71.10
GritLM-7B	7B	60.92	53.74	70.53	61.83	49.75	3.45	22.77	79.94	63.78	58.31	73.33
BGE-M3	0.6B	59.56	52.18	79.11	60.35	40.88	-3.11	20.1	80.76	62.79	54.60	74.12
multilingual-e5-large-instruct	0.6B	63.22	55.08	80.13	64.94	50.75	-0.40	22.91	80.86	62.61	57.12	76.81
gte-Qwen2-1.5B-instruct	1.5B	59.45	52.69	62.51	58.32	52.05	0.74	24.02	81.58	62.58	60.78	71.61
gte-Qwen2-7b-Instruct	7B	62.51	55.93	73.92	61.55	52.77	4.94	25.48	85.13	65.55	60.08	73.98
text-embedding-3-large	-	58.93	51.41	62.17	60.27	46.89	-2.68	22.03	79.17	63.89	59.27	71.68
Cohere-embed-multilingual-v3.0	-	61.12	53.23	70.50	62.95	46.89	-1.89	22.74	79.88	64.07	59.16	74.80
gemini-embedding-exp-03-07	-	68.37	59.59	79.28	71.82	54.59	5.18	29.16	83.63	65.58	67.71	79.40
Qwen3-Embedding-0.6B	0.6B	64.33	56.00	72.22	66.83	52.33	5.09	24.59	80.83	61.41	64.64	76.17
Qwen3-Embedding-4B	4B	69.45	60.86	79.36	72.33	57.15	11.56	26.77	85.05	65.08	69.60	80.86
Qwen3-Embedding-8B	8B	70.58	61.69	80.89	74.00	57.65	10.06	28.66	86.40	65.63	70.88	81.08