Turkish - DeepSeek Open - Source Language Model - Supports Turkish text processing and communication applications

Turkish Deepseek

Developed by alibayram

A language model trained on Turkish text based on the DeepSeek architecture, incorporating Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE) technologies.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Turkish language generation #Sparse mixture of experts #Latent attention compression

Downloads 106

Release Time : 5/30/2025

Model Overview

A language model optimized for Turkish, using advanced MLA and MoE technologies, suitable for Turkish text generation tasks.

Model Features

Multi-Head Latent Attention (MLA)

Uses compressed key-value representations (rank 256), combining independent positionless and position encoding components to achieve efficient memory usage for long sequences

Mixture of Experts (MoE)

Contains 4 routing experts and 2 shared experts, with 2 experts activated per token, reducing computational load through sparse activation

Optimized Turkish language processing

Specifically trained for Turkish, using Turkish Wikipedia data, with a vocabulary optimized for Turkish

YaRN-scaled Rotary Position Encoding

Supports frequency-scaled rotary position embeddings, enabling the extension of context beyond the training length

Model Capabilities

Turkish text generation

Long sequence processing

Efficient memory usage

Use Cases

Text generation

Turkish content creation

Generate Turkish articles, stories, or other creative content

Turkish dialogue system

Build Turkish chatbots or dialogue assistants

Education

Turkish learning assistance

Help learners practice Turkish writing and grammar

🚀 Turkish DeepSeek Model

This model is a language model trained on Turkish texts using the DeepSeek architecture. It incorporates Multi-head Latent Attention (MLA) and Mixture of Experts (MoE) technologies, aiming to provide high - quality Turkish text generation capabilities.

✨ Features

Number of Parameters: ~192M
Vocabulary: 50,256 tokens
Context Length: 256 tokens
Language: Turkish (tr)
Architecture: DeepSeek with MLA + MoE

🔧 Technical Details

Hidden Dimension: 1024
Number of Layers: 6 (1 dense + 5 MoE)
Attention Heads: 8
MoE Experts: 4 routed + 2 shared
Active Experts: 2 per token

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("your-username/turkish-deepseek", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("your-username/turkish-deepseek")

# Text generation
prompt = "Merhaba dünya"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=50,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Advanced Usage

# To use the original implementation
import torch
import sentencepiece as spm

# Load the tokenizer
tokenizer = spm.SentencePieceProcessor()
tokenizer.load("tokenizer.model")

# Load the model checkpoint
checkpoint = torch.load("pytorch_model.bin", map_location="cpu")

# Load using your original model class
# from your_original_implementation import Transformer, ModelArgs
# model = Transformer(args)
# model.load_state_dict(checkpoint)

📚 Documentation

Training Data

Source: Turkish Wikipedia
Tokenization: SentencePiece BPE
Vocabulary: Optimized for the Turkish language

Model Architecture

Multi - head Latent Attention (MLA)

Compressed key - value representations (rank 256)
Separate no - position and position encoding components
Efficient memory usage for long sequences

Mixture of Experts (MoE)

Top - 2 routing and load balancing
Shared experts for common patterns
Reduced computation with sparse activation

RoPE with YaRN Scaling

Rotational position embedding with frequency scaling
Extended context support beyond the training length
Base frequency: 10000.0

Performance

Inference: Optimized for Turkish text generation
Memory: MLA reduces the KV cache size
Speed: MoE allows for greater capacity with controlled computation

Limitations

Trained mainly on Turkish Wikipedia (limited domain coverage)
Context length is limited to 256 tokens
May exhibit biases present in the training data

Citation

If you use this model, please cite it as follows:

@misc{turkish-deepseek,
  title={Turkish DeepSeek Language Model},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/your-username/turkish-deepseek}
}

📄 License

Apache 2.0 License

Model Card Authors

[Your Name]

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご