Model Selection

Transformer architecture

# Transformer architecture

Fastvlm 1.5B Stage3 MNN

FastVLM-1.5B-Stage3-MNN is a text generation model based on the Transformer architecture. It is an 8-bit quantized version of FastVLM-1.5B-Stage3, suitable for text generation scenarios such as chatting.

Large Language Model English

Sundial Base 128m

Sundial is a series of generative time series foundation models capable of zero-shot inference for both deterministic and probabilistic forecasting.

Ast Finetuned Audioset 10 10 0.4593 ONNX

This is the ONNX version of the AST (Audio Spectrogram Transformer) model, designed specifically for audio classification tasks and fine-tuned on the AudioSet dataset.

Audio Classification

Falcon E 3B Instruct

Falcon-E-3B-Instruct is an efficient language model based on a 1.58-bit architecture, optimized for edge devices, with excellent inference capabilities and low memory usage.

Large Language Model

Orpheus TTS MediaSpeech

This is an Arabic model trained on the MediaSpeech dataset. Specific uses and functionalities require further information for confirmation.

Large Language Model

Transformers Arabic

The Camel Model is a text generation model based on the transformer architecture, supporting Azerbaijani and trained using reinforcement learning.

Large Language Model

Transformers Other

Transformer-based bidirectional machine translation model supporting mutual translation among Slavic languages

Machine Translation

Transformers Supports Multiple Languages

Vit Large Patch14 Dinov2.lvd142m

A vision Transformer (ViT)-based image feature model, pre-trained on the LVD-142M dataset using the self-supervised DINOv2 method.

Image Classification

Vit Liveness Detection V1.0

This model is a face liveness detection model based on the Transformer library and has achieved excellent performance on the evaluation set.

MOMENT is a series of general-purpose foundational models for time series analysis, supporting various tasks such as forecasting, classification, anomaly detection, etc., with out-of-the-box and fine-tuning capabilities.

Materials Science

Speecht5 Finetuned Emirhan Tr

A Turkish text-to-speech model fine-tuned based on Microsoft SpeechT5, capable of generating high-quality Turkish speech.

Speech Synthesis

TensorBoard Other

Swahili English Translation

A Transformer model specifically developed for bidirectional translation between Swahili and English, fine-tuned on 210,000 sentence pairs

Machine Translation

A Transformer encoder model based on BERT architecture, specifically designed for generating RNA sequence embeddings

Dictalm2 It Qa Fine Tune

This is a fine-tuned version of Dicta - IL's dictalm2.0 - instruct model, specifically designed for generating Hebrew question-answer pairs.

Question Answering System

Transformers Other

Real3D is a 2D-to-3D mapping Transformer model based on the TripoSR architecture, extending its capability to process real-world images through unsupervised self-training and automatic data filtering.

Codontransformer

The ultimate tool for codon optimization, capable of converting protein sequences into DNA sequences optimized for target organisms.

Medsam Breast Cancer

Image segmentation model based on the Transformers library for image segmentation tasks in vision applications

Image Segmentation

Transformers Other

MichaelSoloveitchik

Segformer B3 Fashion

A fashion item image segmentation model based on SegFormer architecture, specifically designed for identifying and segmenting clothing and accessories

Image Segmentation

PLLaVA is an open-source video language chatbot, obtained by fine-tuning a large image language model on video instruction following data, which can be used for the research of multimodal large models and chatbots.

Trocr Base Spanish

Base version of TrOCR model, specifically designed for Spanish printed text, based on Transformer architecture, fine-tuned on a custom dataset

Text Recognition

Transformers Supports Multiple Languages

Granite Timeseries Patchtst

PatchTST is a Transformer-based time series forecasting model designed for long-term time series forecasting, utilizing subsequence patching and channel independence techniques to improve prediction accuracy.

Dpt Beit Large 512

A monocular depth estimation model based on BEiT Transformer, capable of inferring fine depth information from a single image

Llm Jp 13b Instruct Full Jaster Dolly Oasst V1.0

A large-scale language model developed by the Japanese LLM-jp project, supporting text generation tasks in Japanese and English

Large Language Model

Transformers Supports Multiple Languages

GPT-2 is a self-supervised pre-trained language model based on the Transformer architecture, which excels at text generation tasks.

Large Language Model

demo-leaderboard

Bge Base En V1.5 Ct2

BGE Base English v1.5 is a transformer-based sentence embedding model, specifically designed for extracting sentence features and calculating sentence similarity.

Transformers English

Discogs Maest 10s Pw 129e

MAEST is a Transformer model family based on PASST, focusing on music analysis applications, particularly excelling in music genre classification tasks.

Audio Classification

Dogs Breed Classification Using Vision Transformers

This is a model for image classification tasks, supporting the English language and adopting an open license.

Image Classification

Transformers English

Hubert Base Audioset

Audio representation model based on HuBERT architecture, pre-trained on the complete AudioSet dataset, suitable for general audio tasks

Audio Classification

A vision Transformer model trained using the DINOv2 method, extracting robust visual features from massive image data through self-supervised learning

Image Classification

Segformer B0 Finetuned Segments Sidewalk 2

A SegFormer semantic segmentation model fine-tuned on the Segments.ai sidewalk-semantic dataset, suitable for sidewalk scene analysis

Image Segmentation

Trocr Base Printed Fr

Transformer-based French printed text OCR model, filling the gap of French version in TrOCR models

Transformers French

Japanese Hubert Base

Japanese HuBERT base model trained by rinna Co., Ltd., based on approximately 19,000 hours of Japanese speech corpus ReazonSpeech v1.

Speech Recognition

Transformers Japanese

Trocr Processor

TrOCR is a Transformer-based optical character recognition model specifically designed for handwritten text recognition, fine-tuned on the IAM handwritten database.

Plant Disease Classification2

An image classification model based on the transformers library for identifying and classifying plant diseases.

Image Classification

An OCR system based on Transformer architecture, specifically designed for recognizing Central Kurdish text, trained using synthetic data.

Text Recognition

Pythia-160M is a language model dedicated to interpretability research developed by EleutherAI. It belongs to the 160M parameter scale version in the Pythia suite and is based on the Transformer architecture, trained on the Pile dataset.

Large Language Model

Transformers English

The BLEURT model implemented based on PyTorch, used for text evaluation tasks in natural language processing.

Large Language Model

Segformer Finetuned Segments Cmp Facade

A building facade semantic segmentation model based on SegFormer architecture, capable of recognizing 12 types of architectural elements

Image Segmentation

Transformers English

Oneformer Ade20k Swin Tiny

The first multi-task universal image segmentation framework, supporting semantic/instance/panoptic segmentation tasks with a single model

Image Segmentation

A scientific term recognition model based on SciBERT, supporting NER-enhanced topic modeling

Sequence Labeling

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase