MiniPLM-Qwen-200M Open Source AI Model - Empowering Efficient Processing of Diverse Tasks Based on the Qwen Architecture

Miniplm Qwen 200M

Developed by MiniLLM

A 200M-parameter model based on the Qwen architecture, pretrained from scratch using the MiniPLM knowledge distillation framework

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Knowledge Distillation Pretraining #Lightweight Language Model #Efficient Text Generation

Downloads 203

Release Time : 10/17/2024

Model Overview

MiniPLM-Qwen-200M is a lightweight language model trained with knowledge distillation techniques, using Qwen1.5-1.8B as the teacher model, featuring efficient performance and good scalability.

Model Features

Knowledge Distillation Training

Utilizes the MiniPLM knowledge distillation framework to learn from the Qwen1.5-1.8B teacher model, achieving efficient knowledge transfer

Diverse Sampling Optimization

Employs a pretraining corpus optimized with diverse sampling to enhance training efficiency and model performance

High Computational Efficiency

Outperforms conventional pretraining methods under the same computational budget, with good scalability

Model Capabilities

Text Generation

Language Understanding

Use Cases

Natural Language Processing

Text Generation Applications

Can be used to generate coherent and meaningful text content

Language Model Research

Serves as a research benchmark for lightweight language models

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Miniplm Qwen 200M

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MinPLM-Qwen-200M

🚀 Quick Start

✨ Features

📚 Documentation

Baseline Models

📄 License

📖 Citation