m2-bert-80M-8k-retrieval Open-Source Model - Fine-tuned for Long Context Retrieval Tasks, Precise and Efficient!

M2 Bert 80M 8k Retrieval

Developed by togethercomputer

This is an 80-million-parameter M2-BERT pre-trained checkpoint with a sequence length of 8192, fine-tuned for long-context retrieval tasks.

Text Embedding

Transformers

EnglishOpen Source License:Apache-2.0 #Long Text Retrieval #8192 Sequence Length #Sub-quadratic Complexity

Downloads 198

Release Time : 11/4/2023

Model Overview

The Monarch Mixer-BERT model is a simple GEMM-based sub-quadratic complexity architecture designed for long-context retrieval tasks.

Model Features

Long Sequence Processing

Supports sequences up to 8192 in length, making it suitable for long-context retrieval tasks.

Efficient Architecture

A simple GEMM-based sub-quadratic complexity architecture with high computational efficiency.

Pre-training and Fine-tuning

Pre-trained and fine-tuned for retrieval tasks, generating 768-dimensional retrieval embeddings.

Model Capabilities

Sentence Similarity Calculation

Long Text Retrieval

Embedding Generation

Use Cases

Information Retrieval

Document Retrieval

Used for retrieving relevant documents from a large corpus.

Question Answering Systems

Used for retrieving relevant answers in question-answering systems.

🚀 Monarch Mixer-BERT

An 80M checkpoint of M2-BERT, pretrained with sequence length 8192, and fine-tuned for long-context retrieval.

This model offers a powerful solution for long-context retrieval tasks, leveraging the pre - trained capabilities of M2 - BERT.

🚀 Quick Start

This Monarch Mixer - BERT is an 80M checkpoint of M2 - BERT. It was pretrained with a sequence length of 8192 and fine - tuned for long - context retrieval. You can find more details about the training process in the paper and our blog post.

✨ Features

Pretrained with a sequence length of 8192.
Fine - tuned for long - context retrieval.
Generates 768 - dimensional embeddings for retrieval.

📦 Installation

Check out our GitHub for instructions on how to download and fine - tune this model.

💻 Usage Examples

Basic Usage

You can load this model using Hugging Face AutoModel:

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-8k-retrieval",
  trust_remote_code=True
)

You should expect to see a large error message about unused parameters for FlashFFTConv. If you'd like to load the model with FlashFFTConv, you can check out our GitHub.

Advanced Usage

This model generates embeddings for retrieval. The embeddings have a dimensionality of 768:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

max_seq_length = 8192
testing_string = "Every morning, I make a cup of coffee to start my day."
model = AutoModelForSequenceClassification.from_pretrained(
  "togethercomputer/m2-bert-80M-8k-retrieval",
  trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
  "bert-base-uncased",
  model_max_length=max_seq_length
)
input_ids = tokenizer(
  [testing_string],
  return_tensors="pt",
  padding="max_length",
  return_token_type_ids=False,
  truncation=True,
  max_length=max_seq_length
)

outputs = model(**input_ids)
embeddings = outputs['sentence_embedding']

You can also get embeddings from this model using the Together API as follows (you can find your API key here):

import os
import requests

def generate_together_embeddings(text: str, model_api_string: str, api_key: str):
    url = "https://api.together.xyz/api/v1/embeddings"
    headers = {
        "accept": "application/json",
        "content-type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    session = requests.Session()
    response = session.post(
        url,
        headers=headers,
        json={
            "input": text,
            "model": model_api_string
        }
    )
    if response.status_code != 200:
        raise ValueError(f"Request failed with status code {response.status_code}: {response.text}")
    return response.json()['data'][0]['embedding']

print(generate_together_embeddings(
  'Hello world',
  'togethercomputer/m2-bert-80M-8k-retrieval',
  os.environ['TOGETHER_API_KEY'])[:10]
)

📄 License

This model is licensed under the Apache - 2.0 license.

📚 Documentation

Acknowledgments

Alycia Lee helped with AutoModel support.

Citation

If you use this model, or otherwise found our work valuable, you can cite us as follows:

@inproceedings{fu2023monarch,
  title={Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture},
  author={Fu, Daniel Y and Arora, Simran and Grogan, Jessica and Johnson, Isys and Eyuboglu, Sabri and Thomas, Armin W and Spector, Benjamin and Poli, Michael and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご