🚀 Qwen2-7B-Instruct-embed-base
Qwen2-7B-Instruct-embed-base is a pre - trained model for text classification. It's part of the Qwen2 language model series, offering embedding capabilities.
✨ Features
- Model Architecture: Based on the Transformer architecture with SwiGLU activation, attention QKV bias, and group query attention.
- Tokenizer: An improved tokenizer that is adaptive to multiple natural languages and codes.
- Embedding - ready: The 'lm_head' layer has been removed, making it suitable for embedding tasks.
📦 Installation
The code of Qwen2 has been integrated into the latest Hugging Face transformers. We recommend installing transformers>=4.37.0
to avoid the following error:
KeyError: 'qwen2'
💻 Usage Examples
Basic Usage
The 'lm_head' layer of this model has been removed, which means it can be used for embeddings. However, it needs further fine - tuning for optimal performance, as demonstrated by intfloat/e5-mistral-7b-instruct.
Advanced Usage
Inference with Sentence - Transformers
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer("ssmits/Qwen2-7B-embed-base")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
embeddings_tensor = torch.tensor(embeddings)
similarities = torch.nn.functional.cosine_similarity(embeddings_tensor.unsqueeze(0), embeddings_tensor.unsqueeze(1), dim=2)
print(similarities)
⚠️ Important Note
In tests, this model utilizes more than 24GB (RTX 4090), so an A100 or A6000 would be required for inference.
Inference with HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['This is an example sentence', 'Each sentence is converted']
tokenizer = AutoTokenizer.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base')
model = AutoModel.from_pretrained('ssmits/Qwen2-7B-Instruct-embed-base')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
How to enable Multi - GPU
from transformers import AutoModel
from torch.nn import DataParallel
model = AutoModel.from_pretrained("ssmits/Qwen2-7B-Instruct-embed-base")
for module_key, module in model._modules.items():
model._modules[module_key] = DataParallel(module)
📚 Documentation
Model Details
Qwen2 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
📄 License
This model is licensed under the Apache 2.0 license.
Property |
Details |
Pipeline Tag |
text - classification |
Tags |
pretrained |
Library Name |
sentence - transformers |
License |
apache - 2.0 |