Model Selection

Multi-Dataset Training

# Multi-Dataset Training

TIPO is a text-to-image prompt optimization system based on text pre-sampling, which enhances the quality and usability of generative models by optimizing user input prompts through large language models.

Text-to-Image English

Vitpose Base Coco Aic Mpii

ViTPose is a human pose estimation model based on Vision Transformer, achieving outstanding performance on benchmarks like MS COCO through simple architectural design.

Pose Estimation

Transformers English

A Hindi text-to-speech model trained from scratch based on the F5 architecture, developed by the SPRING Lab at the Indian Institute of Technology Madras.

Speech Synthesis Other

TIPO is a 500-million-parameter model based on the LLaMA architecture, specifically designed for prompt optimization in text-to-image generation.

Text-to-Image English

distilvit is an image-to-text model based on a VIT image encoder and a distilled GPT-2 text decoder, capable of generating textual descriptions of images.

Pix2text Table Rec

A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents

Text Recognition

Japanese Reranker Cross Encoder Large V1

A high-performance cross-encoder model optimized for Japanese text reranking tasks, featuring a 24-layer architecture with 1024 hidden units

Text Embedding Japanese

Japanese Bge Reranker V2 M3 V1

This is a Japanese Reranker (Cross-Encoder) model for text ranking tasks, featuring 24 layers and a hidden layer size of 1024.

Text Embedding Japanese

Japanese Reranker Cross Encoder Small V1

This is a Japanese-trained Reranker (Cross-Encoder) model for text ranking tasks.

Text Embedding Japanese

Japanese Reranker Cross Encoder Xsmall V1

This is a Japanese-trained Reranker (Cross-Encoder) model for text ranking tasks.

Text Embedding Japanese

PairRM is an efficient pairwise reward model for comparing and ranking output candidates from large language models, supporting various applications such as RLHF and Best-N sampling.

Large Language Model

Transformers English

Ag Nli DeTS Sentence Similarity V1

This model is trained using the Cross-Encoder class from SentenceTransformers to predict the semantic similarity score between two sentences.

Transformers Supports Multiple Languages

All MiniLM L6 V2 Ct2 Int8

This is a sentence embedding model based on the MiniLM architecture, capable of mapping text to a 384-dimensional vector space, suitable for semantic search and text similarity tasks.

Text Embedding English

Binarization Segformer B3

A document image binarization model fine-tuned based on the SegFormer-B3 architecture, excelling in DIBCO evaluation metrics

Image Segmentation

Reward Model Deberta V3 Large V2

This reward model is trained to predict which generated answer humans would prefer for a given question. Suitable for QA evaluation, RLHF reward scoring, and toxic answer detection.

Large Language Model

Transformers English

Sbert All MiniLM L6 With Pooler

An ONNX model based on sentence-transformers that maps text to a 384-dimensional vector space, suitable for semantic search and clustering tasks.

Text Embedding English

Bert Base Cased Qa Evaluator

A BERT-base-cased based QA pair evaluation model for determining semantic relevance between questions and answers

Question Answering System

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase