N

Nase

Developed by aiana94
NaSE is a news domain-specialized multilingual sentence encoder, based on LaBSE with domain-specific training, supporting sentence embedding and similarity calculation for 100+ languages.
Downloads 14
Release Time : 6/17/2024

Model Overview

This model is a domain-adapted multilingual sentence encoder, specifically optimized for news text through denoising autoencoding and machine translation objectives, suitable for tasks like sentence similarity and information retrieval.

Model Features

News Domain Adaptation
Domain-specialized training using Polynews and PolyNewsParallel datasets to optimize semantic representation for news text.
Multilingual Support
Supports sentence embeddings for 100+ languages, including low-resource languages, with a language distribution smoothing sampling strategy.
Dual Training Objectives
Combines denoising autoencoding (DAE) and machine translation (MT) objectives to enhance cross-lingual semantic capture capabilities.

Model Capabilities

Multilingual sentence embedding
Cross-lingual sentence similarity calculation
News text semantic retrieval
Multilingual text clustering

Use Cases

Information Retrieval
Cross-Lingual News Recommendation
Utilizes sentence embeddings to calculate semantic similarity between news in different languages for cross-lingual content recommendation.
Text Analysis
Multilingual News Clustering
Performs semantic clustering of global news to identify similar event reports across languages.
Featured Recommended AI Models
ยฉ 2025AIbase