# Large-scale pre-training

Qwen3 8B Base
Apache-2.0
Qwen3-8B-Base is the latest generation of Tongyi's large model series, with 8.2 billion parameters and support for 119 languages. It is suitable for a variety of natural language processing tasks.
Large Language Model Transformers
Q
unsloth
5,403
1
LHM
Apache-2.0
LHM is a feedforward model that can reconstruct an animatable 3D human body from a single image within seconds. Trained with image reconstruction loss on a large-scale video dataset, our model demonstrates strong generalization capabilities across diverse real-world scenarios.
3D Vision English
L
3DAIGC
22
21
Izanami Wav2vec2 Large
Other
Japanese wav2vec2.0 Large model pre-trained on large-scale Japanese TV broadcast audio data
Speech Recognition Japanese
I
imprt
89
1
Kushinada Hubert Large
Apache-2.0
A Japanese HuBERT Large model pre-trained on 62,215 hours of Japanese TV broadcast audio data for speech feature extraction
Speech Recognition Japanese
K
imprt
1,041
2
Kushinada Hubert Base
Apache-2.0
Japanese speech feature extraction model pre-trained on 62,215 hours of Japanese TV broadcast audio data
Speech Recognition Japanese
K
imprt
1,922
1
Videomaev2 Base
VideoMAEv2-Base is a self-supervised video feature extraction model that employs a dual masking mechanism pre-trained on the UnlabeldHybrid-1M dataset.
Video Processing
V
OpenGVLab
3,565
5
Sam2 Hiera Large.fb R1024 2pt1
Apache-2.0
SAM2 model based on HieraDet image encoder, focusing on efficient image feature extraction
Image Segmentation Transformers
S
timm
31
0
Eva02 Enormous Patch14 Clip 224.laion2b
MIT
EVA-CLIP is a vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.
Text-to-Image
E
timm
38
0
Vit Huge Patch14 Clip 224.metaclip 2pt5b
A dual-purpose vision-language model trained on the MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
3,173
0
Vit Large Patch14 Clip 224.metaclip 2pt5b
A dual-framework compatible vision model trained on MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
2,648
0
Vit Large Patch14 Clip 224.metaclip 400m
Vision Transformer model trained on MetaCLIP-400M dataset, supporting zero-shot image classification tasks
Image Classification
V
timm
294
0
Vit Base Patch16 Clip 224.metaclip 2pt5b
A dual-framework compatible vision model trained on the MetaCLIP-2.5B dataset, supporting both OpenCLIP and timm frameworks
Image Classification
V
timm
889
1
Vit Base Patch32 Clip 224.metaclip 2pt5b
A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks
Image Classification
V
timm
5,571
0
Structtable InternVL2 1B
Apache-2.0
A multimodal table recognition model based on InternVL2-1B, supporting conversion of table images to LaTeX/HTML/Markdown formats
Image-to-Text Safetensors Supports Multiple Languages
S
U4R
1,833
37
Eurollm 1.7B
Apache-2.0
EuroLLM-1.7B is the first pre-trained model in the EuroLLM series, with multilingual processing capabilities, capable of understanding and generating text in multiple European and other related languages.
Large Language Model Transformers Supports Multiple Languages
E
utter-project
3,444
65
Retnet 1.3B 100B
MIT
A text generation model trained on the SlimPajama-627B dataset, utilizing the retina network architecture.
Large Language Model Safetensors Supports Multiple Languages
R
fla-hub
57
1
Internvit 6B 224px
MIT
InternViT-6B-224px is a foundational vision model focused on image feature extraction, with 5903 million parameters, supporting image inputs of 224x224 pixels.
Image Classification Transformers
I
OpenGVLab
160
25
Vit Bigg 14 CLIPA Datacomp1b
Apache-2.0
CLIPA-v2 model, focusing on zero-shot image classification tasks, achieving efficient visual representation learning through contrastive image-text training
Text-to-Image
V
UCSC-VLAA
623
4
Vit Bigg 14 CLIPA 336 Datacomp1b
Apache-2.0
CLIPA-v2 model, an efficient contrastive image-text model, focused on zero-shot image classification tasks
Text-to-Image
V
UCSC-VLAA
259
4
Vit H 14 CLIPA Datacomp1b
Apache-2.0
CLIPA-v2 model, an efficient contrastive vision-language model designed for zero-shot image classification tasks.
Text-to-Image
V
UCSC-VLAA
65
1
Metaclip L14 400m
MetaCLIP is a vision-language model trained on CommonCrawl data for constructing shared image-text embedding spaces.
Text-to-Image Transformers
M
facebook
325
3
Metaclip B16 400m
MetaCLIP is a vision-language model trained on CommonCrawl data for constructing shared image-text embedding spaces
Text-to-Image Transformers
M
facebook
51
1
Metaclip B32 Fullcc2.5b
MetaCLIP is a vision-language model trained on 2.5 billion data points from CommonCrawl (CC) to construct a shared image-text embedding space.
Text-to-Image Transformers
M
facebook
413
7
Unsup Simcse Ja Large
This is an unsupervised learning-based Japanese sentence embedding model, specifically designed to generate high-quality Japanese sentence embeddings.
Text Embedding Transformers Japanese
U
cl-nagoya
59
1
Nucleotide Transformer V2 50m Multi Species
The Nucleotide Transformer is a set of foundational language models pre-trained on whole-genome DNA sequences, integrating genomic data from over 3,200 human genomes and 850 diverse species.
Molecular Model Transformers
N
InstaDeepAI
18.72k
3
Stt En Fastconformer Ctc Large
This is a large automatic speech recognition (ASR) model based on the FastConformer architecture, specifically designed for transcribing English speech into text.
Speech Recognition English
S
nvidia
1,001
12
Sam Vit Large
Apache-2.0
SAM is a visual model capable of generating high-quality object masks from input points or bounding boxes, with zero-shot transfer capability.
Image Segmentation Transformers Other
S
facebook
455.43k
28
Sam Vit Base
Apache-2.0
SAM is a vision model capable of generating high-quality object masks from input prompts (such as points or boxes), supporting zero-shot segmentation tasks
Image Segmentation Transformers Other
S
facebook
635.09k
137
Mgpt 13B
MIT
mGPT 13B is a multi-language language model that supports 61 languages, covering 25 language families. It is trained on 600GB of text data and has powerful multi-language processing capabilities.
Large Language Model Transformers Supports Multiple Languages
M
ai-forever
4,742
49
Sam Vit Huge
Apache-2.0
SAM is a vision model capable of generating high-quality object masks based on input prompts, supporting zero-shot transfer to new tasks
Image Segmentation Transformers Other
S
facebook
324.78k
163
Nucleotide Transformer 2.5b Multi Species
A DNA sequence analysis model pre-trained on genomes from 850 species, supporting tasks such as molecular phenotype prediction
Molecular Model Transformers
N
InstaDeepAI
2,714
38
FRED T5 Large
Apache-2.0
A Russian pre-trained language model based on the T5 architecture, employing a mixed training strategy with 7 denoisers similar to UL2, supporting various text generation tasks.
Large Language Model Transformers Other
F
ai-forever
998
25
Deberta V1 Base
Apache-2.0
DeBERTa-base is a pre-trained bidirectional encoder for Russian, mainly used for processing Russian text tasks.
Large Language Model Transformers Supports Multiple Languages
D
deepvk
160
8
CLIP Convnext Large D.laion2b S26b B102k Augreg
MIT
Large-scale ConvNeXt-Large CLIP model trained on LAION-2B dataset, supporting zero-shot image classification and image-text retrieval tasks
Text-to-Image TensorBoard
C
laion
80.74k
5
T5 Efficient Gc4 All German Small El32
MIT
A T5 model trained on the large-scale cleaned German Common Crawl corpus (GC4), specializing in German natural language processing tasks.
Large Language Model Transformers German
T
GermanT5
52
4
FRED T5 1.7B
Apache-2.0
Russian pre-trained language model based on T5 architecture, employing a UL2-like mixed training strategy with 7 denoising tasks, 1.7 billion parameters
Large Language Model Transformers Other
F
ai-forever
1,671
77
Ruscibert
Apache-2.0
A Russian BERT model jointly trained by Sber AI team and MLSA Lab at Moscow State University's AI Institute, specializing in scientific text processing
Large Language Model Transformers Other
R
ai-forever
1,044
7
Bit 50
Apache-2.0
BiT is a simple method for scaling up pre-training of ResNet-like architectures, bringing significant improvements in transfer learning.
Image Classification Transformers Other
B
google
9,766
4
Gpt2 Small
MIT
GPT-2 is an autoregressive language model based on the Transformer architecture. It is pre-trained on a large-scale English corpus through self-supervised learning and excels at text generation tasks.
Large Language Model Transformers English
G
ComCom
1,032
3
Roberta Large NER
Named entity recognition model fine-tuned on the English CoNLL-2003 dataset based on the XLM-RoBERTa-large model
Sequence Labeling Supports Multiple Languages
R
51la5
60.39k
48
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase