m2-bert-80M-2k-retrieval Open Source Model - Supports Long Context Retrieval Tasks, Precise and Efficient!

M2 Bert 80M 2k Retrieval

Developed by togethercomputer

This is an 80M parameter M2-BERT pre-trained checkpoint with a sequence length of 2048, fine-tuned for long-context retrieval tasks.

Text Embedding

Transformers

EnglishOpen Source License:Apache-2.0 #Long-text retrieval #2048 sequence length #Sub-quadratic architecture

Downloads 538

Release Time : 11/13/2023

Model Overview

The Monarch Mixer-BERT model is a GEMM-based sub-quadratic architecture specifically optimized for long-context retrieval tasks, capable of generating high-quality embedding vectors for information retrieval.

Model Features

Long sequence processing capability

Supports sequences up to 2048 in length, suitable for processing long text content

Efficient retrieval

Specifically optimized for retrieval tasks, capable of generating high-quality 768-dimensional embedding vectors

Sub-quadratic architecture

Utilizes Monarch Mixer architecture, achieving efficient computation based on GEMM

Model Capabilities

Long-text embedding generation

Sentence similarity calculation

Information retrieval

Use Cases

Information retrieval

Document retrieval

Can be used to build document retrieval systems to find relevant documents based on query content

Semantic search

Supports search functionality based on semantics rather than keywords

🚀 Monarch Mixer-BERT

An 80M checkpoint of M2-BERT, pretrained with sequence length 2048, fine - tuned for long - context retrieval.

This model offers an effective solution for long - context retrieval tasks, leveraging its pre - training with a sequence length of 2048. It provides a practical approach for applications that require handling long sequences.

Check out the paper Monarch Mixer: A Simple Sub - Quadratic GEMM - Based Architecture and our blog post on retrieval for more on how we trained this model for long sequence.

This model was trained by Jon Saad - Falcon, Dan Fu, and Simran Arora.

Check out our GitHub for instructions on how to download and fine - tune it!

🚀 Quick Start

✨ Features

Pretrained with a sequence length of 2048.
Fine - tuned for long - context retrieval.

📦 Installation

You can load this model using Hugging Face AutoModel. For detailed installation and fine - tuning instructions, visit our GitHub.

💻 Usage Examples

Basic Usage

You can load this model using Hugging Face AutoModel:

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-2k-retrieval",
  trust_remote_code=True
)

You should expect to see a large error message about unused parameters for FlashFFTConv. If you'd like to load the model with FlashFFTConv, you can check out our GitHub.

Advanced Usage

This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

max_seq_length = 2048
testing_string = "Every morning, I make a cup of coffee to start my day."
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-2k-retrieval",
  trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
  "bert-base-uncased",
  model_max_length=max_seq_length
)
input_ids = tokenizer(
  [testing_string],
  return_tensors="pt",
  padding="max_length",
  return_token_type_ids=False,
  truncation=True,
  max_length=max_seq_length
)

outputs = model(**input_ids)
embeddings = outputs['sentence_embedding']

You can also get embeddings from this model using the Together API as follows (you can find your API key [here](https://api.together.xyz/settings/api - keys)):

import os
import requests

def generate_together_embeddings(text: str, model_api_string: str, api_key: str):
    url = "https://api.together.xyz/api/v1/embeddings"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    session = requests.Session()
    response = session.post(
        url,
        headers=headers,
        json={
            "input": text,
            "model": model_api_string
        }
    )
    if response.status_code != 200:
        raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")
    return response.json()['data'][0]['embedding']

print(generate_together_embeddings(
  'Hello world',
  'togethercomputer/m2-bert-80M-2k-retrieval',
  os.environ['TOGETHER_API_KEY'])[:10]
)

📄 License

This model is licensed under the Apache 2.0 license.

Acknowledgments

Alycia Lee helped with AutoModel support.

Citation

If you use this model, or otherwise found our work valuable, you can cite us as follows:

@inproceedings{fu2023monarch,
  title={Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture},
  author={Fu, Daniel Y and Arora, Simran and Grogan, Jessica and Johnson, Isys and Eyuboglu, Sabri and Thomas, Armin W and Spector, Benjamin and Poli, Michael and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご