B

Bert Base 1024 Biencoder 6M Pairs

Developed by shreyansh26
A long-context bi-encoder based on MosaicML's pre-trained BERT with 1024 sequence length, designed for generating 768-dimensional dense vector representations of sentences and paragraphs
Downloads 24
Release Time : 8/17/2023

Model Overview

This model maps sentences and paragraphs into a 768-dimensional dense vector space, suitable for tasks like clustering or semantic search. Supports a sequence length of 1024 and is trained on 6.4M sentence/paragraph pairs.

Model Features

Long context support
Supports 1024 sequence length, ideal for processing long texts
Efficient bi-encoder
Utilizes a bi-encoder architecture for efficiently generating vector representations of sentences and paragraphs
Large-scale training data
Trained on 6.4M randomly sampled sentence/paragraph pairs

Model Capabilities

Sentence vectorization
Paragraph vectorization
Semantic similarity calculation
Text clustering
Semantic search

Use Cases

Information retrieval
Document retrieval
Using vector similarity for document retrieval
Performs well on multiple retrieval benchmarks
Question answering systems
Used for paragraph retrieval in question answering systems
Text analysis
Text clustering
Text clustering based on semantic similarity
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase