C

Cerebras GPT 111M

Developed by cerebras
A 111M parameter model in the Cerebras-GPT series, adopting GPT-3 style architecture, trained on The Pile dataset, achieving compute-optimal performance following Chinchilla scaling laws.
Downloads 5,975
Release Time : 3/17/2023

Model Overview

This is a 111M parameter causal language model from the Cerebras-GPT series, designed for text generation tasks. It uses standard Transformer architecture and was trained on the Andromeda AI supercomputer.

Model Features

Compute-Optimal Training
Follows Chinchilla scaling laws, training 20 tokens per model parameter to maximize computational efficiency
Hardware Optimization
Trained on Cerebras CS-2 wafer-scale systems using weight streaming technology for efficient scaling
Open Architecture
Adopts standard Transformer architecture for easy research and application

Model Capabilities

English Text Generation
Causal Language Modeling
Zero-Shot Learning
Few-Shot Learning

Use Cases

Text Generation
Content Continuation
Generate coherent follow-up content based on given text fragments
Q&A Systems
Generate answers based on context
Education & Research
Language Model Research
Used for studying LLM scaling laws and training methods
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase