🚀 Sanskrit-qwen-7B-Translate
This model is a fine - tuned version of Qwen/Qwen2.5-7B-Instruct-1M, optimized for Sanskrit language tasks, enabling efficient Sanskrit text processing and translation.
🚀 Quick Start
This model is a fine - tuned version of Qwen/Qwen2.5-7B-Instruct-1M optimized for Sanskrit language tasks.
✨ Features
Model Description
This is a merged version of a fine - tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.
Intended Uses & Limitations
Intended Uses
- Sanskrit text understanding and generation
- Sanskrit - English translation tasks
- Sanskrit language processing
Limitations
- Performance may vary based on the complexity of Sanskrit text
- Model should be used within ethical and legal guidelines
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
No code examples are provided in the original document, so this section is skipped.
📚 Documentation
Training Data
The model was trained on the diabolic6045/Sanskrit-llama dataset.
Training Procedure
Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct-1M
- Training Type: Fine - tuning
- Hardware: Multi - GPU setup
- Training Parameters:
- Learning Rate: 2e - 05
- Epochs: 1
- Batch Size: 2 (total)
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup
Framework Versions
- Transformers 4.49.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
Axolotl Config

See axolotl config
axolotl version: 0.8.0.dev0
base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: diabolic6045/Sanskrit-llama
type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
hub_model_id: Sanskrit-qwen-8B
wandb_project: संस्कृतम्-llama
wandb_entity:
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
🔧 Technical Details
The model is fine - tuned based on the Qwen 2.5 7B model. It uses a custom Sanskrit dataset for training and a multi - GPU setup for hardware support. The training parameters are carefully configured, such as the learning rate, epochs, and batch size, to achieve better performance in Sanskrit language tasks.
📄 License
This model is released under the Apache 2.0 license.