m2-bert-80M-32k-retrieval Open Source Model - Supports Long Sequence and Long Context Retrieval Tasks

M2 Bert 80M 32k Retrieval

Developed by togethercomputer

This is an 80M parameter M2-BERT pre-trained model, supporting sequences up to 32,768 tokens in length, specifically optimized for long-context retrieval tasks.

Text Embedding

Transformers

EnglishOpen Source License:Apache-2.0 #Long Text Retrieval #32k Ultra-Long Sequence #Lightweight 80M Parameters

Downloads 1,274

Release Time : 11/4/2023

Model Overview

A BERT variant model based on the Monarch Mixer architecture, fine-tuned for long-text retrieval tasks, capable of generating high-quality text embeddings.

Model Features

Ultra-Long Context Processing

Supports sequences up to 32,768 tokens, making it suitable for long-document retrieval tasks.

Efficient Architecture

Utilizes the Monarch Mixer sub-quadratic architecture to maintain performance while improving computational efficiency.

Retrieval Optimization

Fine-tuned specifically for retrieval tasks, generating high-quality 768-dimensional text embeddings.

Model Capabilities

Long Text Similarity Calculation

Semantic Retrieval

Text Embedding Generation

Use Cases

Information Retrieval

Long Document Retrieval

Quickly find relevant content from a large collection of long documents.

Capable of effectively processing documents up to 32k tokens in length.

Semantic Search

Document search based on semantics rather than keywords.

Generates high-quality semantic embedding vectors.

🚀 Monarch Mixer-BERT

An 80M checkpoint of M2-BERT, pretrained with a sequence length of 32768 and fine-tuned for long-context retrieval.

This model addresses the challenge of long-sequence processing in natural language tasks. By leveraging a large sequence length during pre-training and fine-tuning for retrieval, it can effectively handle long-context information, providing more accurate and comprehensive results for tasks such as long-text retrieval.

Check out the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture and our blog post on retrieval for more on how we trained this model for long sequences.

This model was trained by Jon Saad-Falcon, Dan Fu, and Simran Arora.

Check out our GitHub for instructions on how to download and fine-tune it!

🚀 Quick Start

This section will guide you through the basic steps of using the Monarch Mixer-BERT model.

✨ Features

Long-Context Retrieval: Fine-tuned for long-context retrieval tasks, capable of handling sequence lengths up to 32768.
Embedding Generation: Can generate 768-dimensional embeddings for retrieval.

💻 Usage Examples

Basic Usage

You can load this model using Hugging Face AutoModel:

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-32k-retrieval",
  trust_remote_code=True
)

You should expect to see a large error message about unused parameters for FlashFFTConv. If you'd like to load the model with FlashFFTConv, you can check out our GitHub.

Advanced Usage

This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

max_seq_length = 32768
testing_string = "Every morning, I make a cup of coffee to start my day."
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-32k-retrieval",
  trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
  "bert-base-uncased",
  model_max_length=max_seq_length
)
input_ids = tokenizer(
  [testing_string],
  return_tensors="pt",
  padding="max_length",
  return_token_type_ids=False,
  truncation=True,
  max_length=max_seq_length
)

outputs = model(**input_ids)
embeddings = outputs['sentence_embedding']

You can also get embeddings from this model using the Together API as follows (you can find your API key here):

import os
import requests

def generate_together_embeddings(text: str, model_api_string: str, api_key: str):
    url = "https://api.together.xyz/api/v1/embeddings"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    session = requests.Session()
    response = session.post(
        url,
        headers=headers,
        json={
            "input": text,
            "model": model_api_string
        }
    )
    if response.status_code != 200:
        raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")
    return response.json()['data'][0]['embedding']

print(generate_together_embeddings(
  'Hello world',
  'togethercomputer/m2-bert-80M-32k-retrieval',
  os.environ['TOGETHER_API_KEY'])[:10]
)

📚 Documentation

Acknowledgments

Alycia Lee helped with AutoModel support.

Citation

If you use this model, or otherwise found our work valuable, you can cite us as follows:

@inproceedings{fu2023monarch,
  title={Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture},
  author={Fu, Daniel Y and Arora, Simran and Grogan, Jessica and Johnson, Isys and Eyuboglu, Sabri and Thomas, Armin W and Spector, Benjamin and Poli, Michael and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご