# Autoregressive Generation
Janus Pro 7B
MIT
Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It processes multimodal tasks using a single unified Transformer architecture by decoupling visual encoding paths.
Text-to-Image
Transformers

J
Athagi
15
1
Yi Ko 6B
Apache-2.0
Yi-Ko-6B is an advanced version of the 01-ai/Yi model, further pre-trained with an extended vocabulary and Korean/English corpus, supporting bilingual text generation in Korean and English.
Large Language Model
Transformers Supports Multiple Languages

Y
beomi
3,183
37
Goliath 120b
Goliath 120B is an autoregressive causal language model created by merging two fine-tuned Llama-2 70B models, supporting conversational tasks.
Large Language Model
Transformers English

G
alpindale
620
238
Molgen 7b
Apache-2.0
A large-scale molecular generation model based on the SELFIES molecular language, capable of de novo molecular generation or completing partial molecular structures.
Molecular Model
Transformers

M
zjunlp
150
8
Musicgen Medium
MusicGen is a text-to-music model that generates high-quality music samples based on text descriptions or audio prompts, utilizing a 1.5-billion-parameter autoregressive Transformer architecture.
Audio Generation
Transformers

M
facebook
1.5M
118
Decapoda Research Llama 7B Hf
Other
LLaMA is an efficient foundational language model developed by Meta AI, available in parameter sizes ranging from 7B to 65B. Based on the Transformer architecture, it is suitable for various natural language processing tasks.
Large Language Model
Transformers

D
baffo32
12.29k
63
Donut Proto
MIT
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder for image-to-text conversion
Image-to-Text
Transformers

D
naver-clova-ix
30
7
Donut Base
MIT
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART).
Image-to-Text
Transformers

D
naver-clova-ix
50.34k
207
Assignment1 Maria
MIT
s2t-small-librispeech-asr is a speech-to-text (S2T) model for automatic speech recognition (ASR), based on a sequence-to-sequence transformer architecture.
Speech Recognition
Transformers English

A
Classroom-workshop
23
0
Fr Boris
MIT
Boris is an autoregressive language model based on the GPT-J architecture with 6 billion parameters, specializing in French text processing.
Large Language Model
Transformers French

F
Cedille
3,085
39
Featured Recommended AI Models