S

Sgpt Bloom 7b1 Msmarco

Developed by bigscience
SGPT-Bloom-7b1-msmarco is a sentence transformer model based on the BLOOM architecture, primarily used for sentence similarity calculation and feature extraction tasks.
Downloads 31
Release Time : 8/26/2022

Model Overview

This model is based on the BLOOM-7b1 architecture and is specifically optimized for sentence similarity and feature extraction tasks. It has been evaluated on various tasks of MTEB (Massive Text Embedding Benchmark), including classification, clustering, retrieval, and bilingual text mining.

Model Features

Multilingual Support
Supports processing in multiple languages, including English, German, Spanish, French, Japanese, and Chinese.
Multi-task Processing
Capable of handling various natural language processing tasks, including sentence similarity calculation, feature extraction, classification, clustering, and retrieval.
Large-scale Benchmark Testing
Comprehensively evaluated on various tasks of MTEB (Massive Text Embedding Benchmark).

Model Capabilities

Sentence similarity calculation
Feature extraction
Text classification
Text clustering
Information retrieval
Bilingual text mining

Use Cases

E-commerce
Product Review Classification
Classify and analyze product reviews on e-commerce platforms like Amazon.
In the MTEB Amazon review classification task, accuracy rates are 33.86% for English, 29.70% for German, 35.97% for Spanish, 35.92% for French, 27.64% for Japanese, and 32.63% for Chinese.
Counterfactual Classification
Identify and analyze counterfactual reviews on e-commerce platforms.
In the MTEB Amazon counterfactual classification task, accuracy rates are 68.06% for English, 61.35% for German, and 58.23% for Japanese.
Academic Research
Academic Paper Clustering
Cluster and analyze academic papers from arXiv and Biorxiv.
In the Arxiv clustering P2P task, V-measure is 44.59, and in the S2S task, it is 38.03; in the Biorxiv clustering P2P task, V-measure is 36.03, and in the S2S task, it is 32.48.
Q&A Systems
Duplicate Question Identification
Identify duplicate questions on Q&A platforms.
In the AskUbuntu duplicate question reranking task, average precision is 59.97%, and mean reciprocal rank is 73.18%.
Cross-language Information Retrieval
Bilingual Text Alignment
Identify parallel texts across different languages.
In the BUCC bilingual text mining task, accuracy rates are 54.28% for German-English, 97.34% for French-English, 46.05% for Russian-English, and 98.10% for Chinese-English.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase