Fairseq Dense 125M
This is a Hugging Face transformers-compatible version conversion of the 125M parameter dense model from Artetxe et al.'s paper
Downloads 27
Release Time : 3/2/2022
Model Overview
This model is a converted version of the native dense 125M parameter model from Artetxe et al.'s paper 'Efficient Large Scale Language Modeling with Mixtures of Experts', suitable for large-scale language modeling tasks.
Model Features
Large-scale language modeling
Focuses on efficient large-scale language modeling tasks
Hugging Face compatible
Converted to be compatible with Hugging Face transformers
Dense parameter structure
Uses a dense parameter structure rather than a Mixture of Experts (MoE) architecture
Model Capabilities
Text generation
Language understanding
Use Cases
Natural language processing
Open large language model evaluation
Evaluated on the HuggingFace Open Large Language Model Leaderboard
Average score of 26.0, performing well across multiple benchmarks
Featured Recommended AI Models