Qwen2-7B-Instruct-embed-base開源模型 - 免費部署生成高質量文本嵌入

首頁

Qwen2 7B Instruct Embed Base

由ssmits開發

Qwen2系列中的7B參數規模嵌入模型，專為生成高質量文本嵌入設計

文本嵌入

Safetensors

英語開源協議:Apache-2.0 #大語言模型嵌入 #高維語義編碼 #多語言支持

下載量 2,895

發布時間 : 6/7/2024

模型概述

基於Transformer架構的預訓練語言模型，移除lm_head層後專門用於生成文本嵌入向量，適用於需要語義表示的下游任務

模型特點

改進的分詞器

自適應多種自然語言和代碼處理需求

高級注意力機制

採用分組查詢注意力技術提升效率

專用嵌入模型

移除lm_head層優化嵌入向量生成

模型能力

文本嵌入生成

語義相似度計算

上下文理解

使用案例

語義搜索

文檔檢索

通過嵌入相似度實現精準文檔匹配

文本分類

情感分析

利用嵌入向量作為分類器輸入特徵

🚀 Qwen2-7B-Instruct-embed-base

Qwen2-7B-Instruct-embed-base是一個用於文本分類的預訓練模型。它基於Transformer架構，具備多種先進特性，可用於生成文本嵌入，在自然語言處理領域有廣泛的應用價值。

🚀 快速開始

本部分將指導你如何快速使用Qwen2-7B-Instruct-embed-base模型進行推理。

✨ 主要特性

Qwen2是一個語言模型系列，包含不同模型大小的解碼器語言模型。針對每個大小，都會發布基礎語言模型和對齊的聊天模型。
基於Transformer架構，採用SwiGLU激活函數、注意力QKV偏差、分組查詢注意力等技術。
擁有改進的分詞器，適用於多種自然語言和代碼。

📦 安裝指南

Qwen2的代碼已集成在最新的Hugging face transformers庫中。建議你安裝transformers>=4.37.0，否則可能會遇到以下錯誤：

KeyError: 'qwen2'

💻 使用示例

基礎用法

此模型的lm_head層已被移除，這意味著它可用於生成嵌入。不過，由於需要進一步微調，其性能可能不是最佳，可參考 intfloat/e5-mistral-7b-instruct 進行優化。

from sentence_transformers import SentenceTransformer
import torch

# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("ssmits/Qwen2-7B-embed-base") # device = "cpu" when <= 24 GB VRAM

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# (3, 3584)

# 3. Calculate the embedding similarities
# Assuming embeddings is a numpy array, convert it to a torch tensor
embeddings_tensor = torch.tensor(embeddings)

# Using torch to compute cosine similarity matrix
similarities = torch.nn.functional.cosine_similarity(embeddings_tensor.unsqueeze(0), embeddings_tensor.unsqueeze(1), dim=2)

print(similarities)
# tensor([[1.0000, 0.8608, 0.6609],
#         [0.8608, 1.0000, 0.7046],
#         [0.6609, 0.7046, 1.0000]])

⚠️ 重要提示

在測試中發現，該模型推理時顯存使用超過24GB（RTX 4090），因此建議使用A100或A6000進行推理。

高級用法

不使用sentence-transformers庫時，你可以按以下方式使用該模型：首先將輸入傳遞給Transformer模型，然後對上下文詞嵌入應用適當的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch

#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base')
model = AutoModel.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base') # device = "cpu" when <= 24 GB VRAM

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

啟用多GPU的方法

from transformers import AutoModel
from torch.nn import DataParallel

model = AutoModel.from_pretrained("ssmits/Qwen2-7B-Instruct-embed-base")
for module_key, module in model._modules.items():
    model._modules[module_key] = DataParallel(module)