🚀 Qwen2.5-7B-embed-base
Qwen2.5-7B-embed-base is a model for text classification. It is based on the Qwen2.5-7B base model, with the 'lm_head' layer removed, making it suitable for embedding tasks.
✨ Features
- Model Architecture: Qwen2.5 is a language model series based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc.
- Tokenizer: It has an improved tokenizer adaptive to multiple natural languages and codes.
📦 Installation
The code of Qwen2.5 has been in the latest Hugging face transformers. We advise you to install transformers>=4.37.0
, or you might encounter the following error:
KeyError: 'Qwen2.5'
💻 Usage Examples
Basic Usage
The 'lm_head' layer of this model has been removed, which means it can be used for embeddings. It will not perform greatly, as it needs to be further fine-tuned, as shown by intfloat/e5-mistral-7b-instruct.
Advanced Usage
Inference with Sentence Transformers
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer("ssmits/Qwen2.5-7B-embed-base")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
embeddings_tensor = torch.tensor(embeddings)
similarities = torch.nn.functional.cosine_similarity(embeddings_tensor.unsqueeze(0), embeddings_tensor.unsqueeze(1), dim=2)
print(similarities)
Note: In my tests it utilizes more than 24GB (RTX 4090), so an A100 or A6000 would be required for inference.
Inference with HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['This is an example sentence', 'Each sentence is converted']
tokenizer = AutoTokenizer.from_pretrained('ssmits/Qwen2.5-7B-embed-base')
model = AutoModel.from_pretrained('ssmits/Qwen2.5-7B-embed-base')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
How to enable Multi-GPU
from transformers import AutoModel
from torch.nn import DataParallel
model = AutoModel.from_pretrained("ssmits/Qwen2.5-7B-embed-base")
for module_key, module in model._modules.items():
model._modules[module_key] = DataParallel(module)
🔧 Technical Details
Qwen2.5 is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
📄 License
This model is licensed under the apache-2.0 license.
Property |
Details |
Model Type |
Text Classification |
Base Model |
Qwen/Qwen2.5-7B |
Library Name |
sentence-transformers |
License |
apache-2.0 |
Pipeline Tag |
text-classification |
Tags |
pretrained |