G

Gte Multilingual Mlm Base

Developed by Alibaba-NLP
mGTE series multilingual text encoder, supporting 75 languages, with a maximum context length of 8192, based on BERT+RoPE+GLU architecture, excelling in GLUE and XTREME-R benchmarks
Downloads 342
Release Time : 8/6/2024

Model Overview

Universal multilingual text encoder, focusing on long-context text representation and re-ranking, suitable for multilingual retrieval tasks

Model Features

Ultra-long context support
Supports a maximum sequence length of 8192, suitable for processing long documents
Multilingual capability
Supports 75 languages, with excellent performance on the multilingual benchmark XTREME-R
Improved architecture design
Adopts the transformer++ architecture of BERT+RoPE+GLU, combining Rotary Position Embedding (RoPE) and Gated Linear Units (GLU)
Multi-stage training strategy
Employs a phased training approach from short to long sequences, effectively supporting long-context modeling

Model Capabilities

Multilingual text encoding
Long-text representation
Text re-ranking
Cross-language retrieval

Use Cases

Information retrieval
Cross-language document retrieval
Retrieving relevant documents in a multilingual environment
Achieved 64.44 points on the XTREME-R benchmark, outperforming XLM-R-base
Natural language understanding
Multilingual text classification
Classifying multilingual text tasks
Achieved 83.47 points on the GLUE benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase