O

Olmo 2 0425 1B Instruct GGUF

Developed by unsloth
OLMo 2 1B Instruct Edition is a post-training variant of the OLMo-2-0425-1B-RLVR1 model, optimized through supervised fine-tuning, DPO training, and RLVR training to achieve state-of-the-art performance across multiple tasks.
Downloads 3,137
Release Time : 5/1/2025

Model Overview

An open language model primarily designed for English text generation tasks, optimized for instruction-following capabilities through multi-stage training.

Model Features

Multi-stage Training Optimization
Optimized through three stages: supervised fine-tuning, DPO training, and RLVR training to enhance instruction-following capabilities.
Open Model
Publicly available code, checkpoints, and training details to promote scientific research in language models.
Intermediate Checkpoints Available
Provides intermediate checkpoints during RLVR training for facilitating RL fine-tuning research.

Model Capabilities

Text Generation
Mathematical Problem Solving
Instruction Following
Dialogue Interaction

Use Cases

Education
Mathematical Problem Solving
Solving mathematical problems such as GSM8K
Achieved 68.3 points on GSM8K
Research
RL Fine-tuning Research
Utilizing intermediate checkpoints for reinforcement learning research
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase