D

Dewey En Beta

Developed by infgrad
Dewey is a novel long-context embedding model based on the ModernBERT architecture, supporting a 128k context window and excelling in long-document retrieval tasks.
Downloads 447
Release Time : 3/23/2025

Model Overview

The Dewey model focuses on improving retrieval performance in long-document scenarios. It employs instruction-based training to align embeddings with tasks, supports both single-vector and multi-vector representations, and features a flexible text chunking mechanism.

Model Features

Ultra-long context support
Supports processing of ultra-long contexts up to 128k tokens.
Multi-vector representation
Supports Colbert-like multi-vector representation but with fewer vectors (only 0.5% of the token count).
Efficient encoding
Benefits from the advantages of the ModernBERT architecture, maintaining efficiency even during long-text encoding.
Flexible chunking
Supports fully customizable text chunking strategies to adapt to different application scenarios.

Model Capabilities

Long-document retrieval
Semantic similarity calculation
Text classification
Text clustering

Use Cases

Information retrieval
Long-document retrieval
Efficient retrieval in databases containing ultra-long documents.
Achieved a score of 0.86 in the LongEmbed benchmark, surpassing multiple commercial models.
Semantic analysis
Semantic similarity calculation
Calculates semantic similarity between texts.
Performed excellently in short-text evaluation (MTEB-eng-v2), surpassing multiple 7B-scale models.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase