Mamba-ko-2.8B Open-Source Korean Pre-trained Model - Trained with Professional Datasets to Meet Diverse Needs

Mamba Ko 2.8b

Developed by kuotient

Mamba-ko-2.8B is a Korean pre-trained model based on state space models, trained using the synthetically generated dataset korean_textbooks.

Large Language Model

Transformers

KoreanOpen Source License:Apache-2.0 #Korean Text Generation #State Space Models #Efficient Hardware Design

Downloads 24

Release Time : 1/24/2024

Model Overview

This model is a Korean language model based on the state space model architecture, focusing on text generation tasks and suitable for Korean natural language processing applications.

Model Features

Efficient Hardware-Aware Design

Adopts hardware-optimized design inspired by FlashAttention to improve computational efficiency.

Korean Language Optimization

Trained using a specially synthesized Korean textbook dataset to optimize Korean language processing capabilities.

State Space Model Architecture

Based on structured state space models, excelling in information-intensive data tasks.

Model Capabilities

Korean Text Generation

Language Modeling

Instruction Understanding and Execution

Use Cases

Education

Educational Content Generation

Generate educational content and materials suitable for children.

Can generate educational content such as nutritional food recommendations.

Natural Language Processing

Korean Text Generation

Generate coherent Korean text.

Capable of producing grammatically and semantically correct Korean sentences.

🚀 Mamba-ko-2.8B🐍

Mamba-ko-2.8B is a state space model that has been further pretrained (or continuously trained) using a synthetically generated dataset - korean_textbooks. It aims to leverage the power of state space models for text generation tasks.

✨ Features

Advanced Architecture: Based on the Mamba state space model architecture, which shows promising performance on information - dense data such as language modeling.
Korean - Focused: Further pretrained with Korean textbooks dataset, making it more suitable for Korean language tasks.

📦 Installation

pip install causal_conv1d>=1.1.0 mamba-ssm==1.1.1

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "kuotient/mamba-ko-2.8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = MambaLMHeadModel.from_pretrained(
        model_name, device=device, dtype=torch.float16)

prompt = "아이들한테 제공할 영양가 있는 음식 5가지의 예시는 다음과 같다."

tokens = tokenizer(prompt, return_tensors='pt')
input_ids = tokens.input_ids.to(device)
streamer = TextStreamer(tokenizer)

out = model.generate(
    input_ids=input_ids,
    streamer=streamer,
    max_length=2000,
    temperature=0.7,
    top_p=0.7,
    eos_token_id=tokenizer.eos_token_id,
)

📚 Documentation

What is Mamba?

Mamba is a new state space model architecture showing promising performance on information - dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware - aware design and implementation in the spirit of FlashAttention.

TODO

🟢 Training with korean_textbooks dataset - DONE
More training with publicly available Korean corpora
🟡 Instruct tuning

Model Details

Property	Details
Developed by	Jisoo Kim(kuotient)
Base Model	state - spaces/mamba-2.8b-slimpj

Model Benchmark

KoBEST

Model	boolq	copa	hellaswag	sentineg
kuotient/mamba-ko-2.8b	0.6213	0.6150	0.4014	0.3383
state_spaces/mamba-2.8b-slimpj	0.3343	0.4867	0.3452	0.3547
kuotient/mamba-ko-2.8b-old (2B trained only)	0.4236	0.5896	0.4012	0.4348
kuotient/mamba-ko-2.8b-old-instruct	0.4041	0.6505	0.4906	0.3348
EleutherAI/polyglot-ko-1.3b	0.3552	0.7196	0.5247	0.6790
maywell/TinyWand-SFT	0.3455	0.6142	0.3944	N/A
microsoft/phi-2	0.3343	0.4792	0.3235	N/A
TinyLlama/TinyLlama-1.1B	0.3343	0.4784	0.3396	N/A

Thanks

We would like to thank maywell for his great contributions and motivation to the Korean LLM community.

📄 License

This project is licensed under the Apache 2.0 license.

💡 Usage Tip If you're interested in building large - scale language models to solve a wide variety of problems in a wide variety of domains, you should consider joining Allganize. For a coffee chat or if you have any questions, please do not hesitate to contact us at kuotient.dev@gmail.com.

The author also thanks Allganize Korea for their generosity in providing resources for this personal project. Note that this project is not directly related to the company's goals or research.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご