đ hotchpotch/japanese-reranker-cross-encoder-small-v1
This is a series of Reranker (CrossEncoder) models trained in Japanese. They are designed to enhance the performance of text ranking tasks.
Property |
Details |
Model Type |
Reranker (CrossEncoder) |
Training Data |
hotchpotch/JQaRA, shunk031/JGLUE, miracl/miracl, castorini/mr-tydi, unicamp-dl/mmarco |
For more information about Rerankers, technical reports, and evaluations, please refer to the following links:
đ Quick Start
Here are examples of how to use the model with different libraries.
đ» Usage Examples
Basic Usage with SentenceTransformers
from sentence_transformers import CrossEncoder
import torch
MODEL_NAME = "hotchpotch/japanese-reranker-cross-encoder-small-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = CrossEncoder(MODEL_NAME, max_length=512, device=device)
if device == "cuda":
model.model.half()
query = "æćçăȘæ ç»ă«ă€ăăŠ"
passages = [
"æ·±ăăăŒăăæăĄăȘăăăăèŠłăäșșăźćżăæșăă¶ăćäœăç»ć Žäșșç©ăźćżæ
æćăç§éžă§ăă©ăčăăŻæ¶ăȘăă§ăŻèŠăăăȘăă",
"éèŠăȘăĄăă»ăŒăžæ§ăŻè©äŸĄă§ăăăăæă話ăç¶ăăźă§æ°ćăèœăĄèŸŒăă§ăăŸăŁăăăăć°ăæăăèŠçŽ ăăăă°ăăăŁăă",
"ă©ăă«ăăȘăąăȘăăŁă«æŹ ăăć±éăæ°ă«ăȘăŁăăăăŁăšæ·±ăżăźăăäșșéăă©ăăèŠăăăŁăă",
"ăąăŻă·ă§ăłă·ăŒăłă愜ăăăăăèŠăŠăăŠéŁœăăȘăăăčăăŒăȘăŒăŻă·ăłăă«ă ăăăăăéă«èŻăă",
]
scores = model.predict([(query, passage) for passage in passages])
Basic Usage with HuggingFace transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.nn import Sigmoid
MODEL_NAME = "hotchpotch/japanese-reranker-cross-encoder-small-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
model.to(device)
model.eval()
if device == "cuda":
model.half()
query = "æćçăȘæ ç»ă«ă€ăăŠ"
passages = [
"æ·±ăăăŒăăæăĄăȘăăăăèŠłăäșșăźćżăæșăă¶ăćäœăç»ć Žäșșç©ăźćżæ
æćăç§éžă§ăă©ăčăăŻæ¶ăȘăă§ăŻèŠăăăȘăă",
"éèŠăȘăĄăă»ăŒăžæ§ăŻè©äŸĄă§ăăăăæă話ăç¶ăăźă§æ°ćăèœăĄèŸŒăă§ăăŸăŁăăăăć°ăæăăèŠçŽ ăăăă°ăăăŁăă",
"ă©ăă«ăăȘăąăȘăăŁă«æŹ ăăć±éăæ°ă«ăȘăŁăăăăŁăšæ·±ăżăźăăäșșéăă©ăăèŠăăăŁăă",
"ăąăŻă·ă§ăłă·ăŒăłă愜ăăăăăèŠăŠăăŠéŁœăăȘăăăčăăŒăȘăŒăŻă·ăłăă«ă ăăăăăéă«èŻăă",
]
inputs = tokenizer(
[(query, passage) for passage in passages],
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
inputs = {k: v.to(device) for k, v in inputs.items()}
logits = model(**inputs).logits
activation = Sigmoid()
scores = activation(logits).squeeze().tolist()
Evaluation Results
đ License
This project is licensed under the MIT License.