N

Nomic Embed Multimodal 3b

Developed by nomic-ai
Nomic Embed Multimodal 3B is a cutting-edge multimodal embedding model focused on visual document retrieval tasks, supporting unified text-image encoding, achieving an outstanding performance of 58.8 NDCG@5 in the Vidore-v2 test.
Downloads 3,431
Release Time : 3/27/2025

Model Overview

This is a 3-billion-parameter multimodal embedding model, excelling in visual document retrieval tasks, capable of directly encoding interleaved text and images without complex preprocessing.

Model Features

Outstanding Performance
Achieves 58.8 NDCG@5 in the Vidore-v2 test, surpassing all dense multimodal embedding models of similar scale.
Unified Text-Image Encoding
Directly encodes interleaved text and images without complex preprocessing.
Advanced Training Methods
Trained using homologous sampling and positive-aware hard negative mining techniques.
Multilingual Support
Supports English, Italian, French, German, and Spanish.

Model Capabilities

Visual Document Retrieval
Multimodal Embedding
Text-Image Joint Encoding
Multilingual Document Processing

Use Cases

Research Field
Academic Paper Retrieval
Captures formulas, charts, and data tables in academic papers.
Improves retrieval accuracy for academic content.
Enterprise Applications
Technical Document Management
Encodes code blocks, flowcharts, and screenshots in technical documents.
Enhances retrieval efficiency for technical documents.
Financial Report Analysis
Embeds trend charts, statistical graphs, and numerical data in financial reports.
Improves retrieval effectiveness for financial data.
E-Commerce
Product Catalog Retrieval
Processes product images, specifications, and price lists.
Optimizes product search experience.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase