Sanskrit-qwen-7B-Translate Open Source Model - Optimize Sanskrit Comprehension and Translation Abilities, Free Deployment!

Home

Sanskrit Qwen 7B Translate

Developed by diabolic6045

A Sanskrit-specific model fine-tuned based on Qwen2.5-7B, optimized for Sanskrit comprehension and translation

Large Language Model

Transformers

Open Source License:Apache-2.0 #Sanskrit translation #Low-resource optimization #Multi-task fine-tuning

Downloads 229

Release Time : 3/5/2025

Model Overview

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M, specifically trained for Sanskrit comprehension and translation tasks, enhancing its ability to process Sanskrit texts.

Model Features

Sanskrit optimization

Fine-tuned on custom Sanskrit datasets, specifically optimized for Sanskrit processing capabilities

Bilingual translation

Supports bidirectional translation tasks between Sanskrit and English

Efficient fine-tuning

Uses QLoRA technology for parameter-efficient fine-tuning, enhancing Sanskrit performance while preserving base model capabilities

Model Capabilities

Sanskrit text comprehension

Sanskrit text generation

Sanskrit-English translation

English-Sanskrit translation

Use Cases

Language translation

Sanskrit literature translation

Translate ancient Sanskrit literature into modern English

Bilingual content creation

Generate bilingual content in Sanskrit and English

Academic research

Sanskrit text analysis

Assist in Sanskrit linguistics and philology research

🚀 Sanskrit-qwen-7B-Translate

This model is a fine - tuned version of Qwen/Qwen2.5-7B-Instruct-1M, optimized for Sanskrit language tasks, enabling efficient Sanskrit text processing and translation.

🚀 Quick Start

This model is a fine - tuned version of Qwen/Qwen2.5-7B-Instruct-1M optimized for Sanskrit language tasks.

✨ Features

Model Description

This is a merged version of a fine - tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.

Intended Uses & Limitations

Intended Uses

Sanskrit text understanding and generation
Sanskrit - English translation tasks
Sanskrit language processing

Limitations

Performance may vary based on the complexity of Sanskrit text
Model should be used within ethical and legal guidelines

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Training Data

The model was trained on the diabolic6045/Sanskrit-llama dataset.

Training Procedure

Training Details

Base Model: Qwen/Qwen2.5-7B-Instruct-1M
Training Type: Fine - tuning
Hardware: Multi - GPU setup
Training Parameters:
- Learning Rate: 2e - 05
- Epochs: 1
- Batch Size: 2 (total)
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup

Framework Versions

Transformers 4.49.0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

Axolotl Config

See axolotl config

axolotl version: 0.8.0.dev0


base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: diabolic6045/Sanskrit-llama
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

hub_model_id: Sanskrit-qwen-8B

wandb_project: संस्कृतम्-llama
wandb_entity: 
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model: 

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

#gpu_memory_limit: 20GiB
#lora_on_cpu: true         

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
   pad_token: <|end_of_text|>

🔧 Technical Details

The model is fine - tuned based on the Qwen 2.5 7B model. It uses a custom Sanskrit dataset for training and a multi - GPU setup for hardware support. The training parameters are carefully configured, such as the learning rate, epochs, and batch size, to achieve better performance in Sanskrit language tasks.

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご