U

Ultralong Thinking

Developed by mergekit-community
An 8B-parameter language model merged using the SLERP method, combining the strengths of DeepSeek-R1 and Nemotron-8B models
Downloads 69
Release Time : 4/17/2025

Model Overview

This is a pre-trained language model merged using the mergekit tool, employing Spherical Linear Interpolation (SLERP) to fuse the DeepSeek-R1 and Nemotron-8B models, aiming to combine their advantageous features

Model Features

Model fusion advantages
Combines DeepSeek-R1's distilled knowledge with Nemotron-8B's ultra-long context processing capability
V-shaped mixing strategy
Input/output layers adopt Hermes characteristics while middle layers use WizardMath features
Long-context support
Inherits Nemotron model's 4M tokens ultra-long context processing capability

Model Capabilities

Text generation
Instruction following
Long-context understanding
Multi-turn dialogue

Use Cases

Dialogue systems
Intelligent assistant
Building intelligent assistants capable of handling complex multi-turn dialogues
Can process context information up to 4M tokens long
Content generation
Long-form writing
Assisting in creating long articles or technical documents
Maintains long-distance contextual consistency
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase