Model Selection

Synthetic Data Training

# Synthetic Data Training

Smolvlm 500M Anime Caption V0.2

A vision-language model specialized in describing anime-style images, fine-tuned based on SmolVLM-500M-Base

Image-to-Text English

A video caption generation model based on VILA-v1.5-8B, capable of generating detailed and human-preference-aligned captions for input videos.

PoseLess is an innovative robotic hand control framework that directly maps 2D images to joint angles using projection representations, eliminating the need for explicit pose estimation.

Multimodal Fusion

Poseless-3B is a vision-language model (VLM)-based robotic hand control framework that directly maps 2D images to joint angles without explicit pose estimation.

Pose Estimation

Gliner Biomed Bi Large V1.0

GLiNER-BioMed is an efficient open NER model suite based on the GLiNER framework, specifically designed for the biomedical domain to recognize various types of biomedical entities.

Sequence Labeling English

Gliner Biomed Bi Base V1.0

GLiNER-BioMed is an efficient open biomedical named entity recognition model suite based on the GLiNER framework, specifically designed for the biomedical domain, capable of recognizing multiple entity types.

Sequence Labeling English

Gliner Biomed Large V1.0

GLiNER-BioMed is a specialized and efficient open biomedical NER model suite based on the GLiNER framework, achieving state-of-the-art zero-shot and few-shot performance in biomedical entity recognition tasks.

Sequence Labeling English

Asagi-8B is a large-scale Japanese Vision-Language Model (VLM) trained on extensive Japanese datasets, integrating diverse data sources.

Transformers Japanese

A high-quality speech language model trained on a single GPU within 24 hours, fine-tuned based on Qwen2.5-0.5B, using Hubert tokens as vocabulary

Audio Generation

Modernbert Large Bias Type Classifier

A text classification model fine-tuned based on ModernBERT-large, designed to detect and classify various types of bias in text.

Text Classification

Transformers English

Asagi-14B is a large-scale Japanese Vision and Language Model (VLM) trained on a wide range of Japanese datasets, integrating diverse data sources.

Transformers Japanese

Flux.1 Dev Controlnet Upscaler

A ControlNet model developed by Jasper Research Team for low-resolution image upscaling

Image Enhancement

Multilingual Sentiment Analysis

A multilingual sentiment analysis model fine-tuned based on DistilBERT, supporting 21 languages, suitable for various scenarios such as social media and customer feedback analysis.

Text Classification

Transformers Supports Multiple Languages

Euclid Convnext Xxlarge 120524

A multimodal large language model specifically trained to enhance low-level geometric perception, improving geometric analysis capabilities through high-fidelity synthetic visual descriptions

Transformers English

euclid-multimodal

mStyleDistance is a multilingual style embedding model designed to closely embed texts with similar writing styles while distancing those with different styles, regardless of content or language constraints.

Pegasus X Base Synthsumm Open 16k

A text summarization model fine-tuned based on pegasus-x-base, trained with synthetic data, excelling in long document summarization tasks.

Text Generation

Transformers English

Flux.1 Dev Controlnet Upscaler

A ControlNet model developed by the Jasper research team for low-resolution image upscaling

Image Enhancement

Reflection Llama 3.1 70B

Reflection Llama-3.1 70B is an open-source large language model trained with 'reflection tuning' technology, capable of autonomously detecting reasoning errors and correcting its approach.

Large Language Model

Depth Anything V2 Metric Indoor Large Hf

A fine-tuned version of Depth Anything V2 for indoor metric depth estimation using the synthetic Hypersim dataset, compatible with the transformers library.

Depth Anything V2 Metric Indoor Base Hf

A version fine-tuned for indoor metric depth estimation tasks using the Hypersim synthetic dataset, based on the Depth Anything V2 model

Depth Anything V2 Metric Indoor Small Hf

A model fine-tuned from Depth Anything V2 for indoor metric depth estimation tasks, trained on the synthetic dataset Hypersim, compatible with the transformers library.

Depth Anything V2 Metric Outdoor Small Hf

A fine-tuned version of Depth Anything V2, specifically designed for metric depth estimation in outdoor scenes, trained on the synthetic dataset Virtual KITTI.

Depth Anything V2 Metric Outdoor Base Hf

A version fine-tuned for outdoor metric depth estimation tasks using the synthetic Virtual KITTI dataset, compatible with the transformers library.

Robust Sentiment Analysis

A sentiment analysis model fine-tuned based on distilbert/distilbert-base-uncased, trained solely on synthetic data, supporting 5 sentiment classifications.

Text Classification

Transformers English

StyleDistance is a style embedding model designed to closely embed texts with similar writing styles and distance those with different styles, unaffected by content.

Text Embedding English

Gemma 2 9B It SPPO Iter3

An 8.9 billion parameter language model developed in the third iteration using self-play preference optimization, starting from google/gemma-2-9b-it and fine-tuned with the UltraFeedback dataset

Large Language Model

Transformers English

Qwen2 1.5B Summarize

A specialized summarization model fine-tuned for 2 rounds based on Qwen2-1.5B-Instruct

Text Generation

Transformers English

thepowerfuldeez

TrOCR-Ru is an optical character recognition model fine-tuned on synthetic datasets of Russian and English, based on microsoft/trocr-base-handwritten, focusing on image-to-text tasks.

Text Recognition

Transformers Supports Multiple Languages

Merlinite 7b Lab

Merlinite 7B is a language model developed based on Mistral-7B-v0.1. It is trained using the LAB alignment method developed by IBM Research and performs excellently in multiple benchmark tests.

Large Language Model

Roberta Base Zeroshot V2.0 C

A zero-shot classification model based on the RoBERTa architecture, designed for text classification tasks without requiring training data, supports both GPU and CPU operation, and is trained using fully business-friendly data.

Text Classification

Transformers English

Zephyr 7b Gemma V0.1

Zephyr 7B Gemma is a language model fine-tuned based on google/gemma-7b, trained on publicly available synthetic datasets using Direct Preference Optimization (DPO), designed to serve as a helpful assistant.

Large Language Model

A Russian and English OCR model fine-tuned from microsoft/trocr-base-handwritten, specializing in handwritten and printed text recognition

Text Recognition

Transformers Supports Multiple Languages

Openmath Mistral 7B V0.1 Hf

The OpenMath model solves mathematical problems by combining textual reasoning with Python interpreter-executed code blocks, fine-tuned based on Mistral-7B-v0.1

Large Language Model

Transformers Supports Multiple Languages

ko-deplot is a Korean visual question answering model based on Google's Pix2Struct architecture, fine-tuned from the Deplot model, supporting chart image question-answering tasks in Korean and English.

Transformers Supports Multiple Languages

Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models.

Large Language Model

Orca 2 is a research-oriented language model developed by Microsoft, focusing on enhancing the reasoning capabilities of small language models, fine-tuned based on LLAMA-2.

Large Language Model

Donutlicenses3v3

This model is used to extract structured information from EU driver's license images and return the results in JSON format.

Text Recognition

Transformers English

Trocr Small Korean

TrOCR is a Korean image-to-text model based on a vision encoder-decoder architecture, using DeiT as the image encoder and RoBERTa as the text decoder.

Image-to-Text Korean

Pythia 2.8b Deduped Synthetic Instruct

An instruction generation model fine-tuned on the deduplicated version of Pythia-2.8B, optimized for synthetic instruction datasets

Large Language Model

Transformers English

Octfusion Exp1 HKDB Synthetic

OCTFusion is a PyTorch-based image classification model that achieved 100% accuracy on synthetic data.

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase