Self Biorag 7b Olaph
A fine-tuned version based on Minbyul/selfbiorag-7b-wo-kqa_golden-iter-dpo-step3-filtered, trained using the HuggingFace MedLFQA dataset (excluding kqa_golden) with Direct Preference Optimization (DPO)
Downloads 20
Release Time : 5/22/2024
Model Overview
This model is a 7B-parameter language model trained with Direct Preference Optimization (DPO), specializing in medical domain question-answering tasks, with response quality optimized through reinforcement learning
Model Features
Direct Preference Optimization
Fine-tuned using the DPO algorithm to optimize the model's preference for high-quality responses
Medical Domain Specialization
Trained on medical QA datasets, suitable for handling professional medical questions
Multi-GPU Training
Distributed training using 4 GPUs to enhance training efficiency
Model Capabilities
Medical question answering
Domain-specific text generation
Preference learning
Use Cases
Healthcare
Medical Knowledge QA System
Building an intelligent assistant capable of answering professional medical questions
Performs excellently on the MedLFQA dataset
Medical Education Tool
QA system for medical student education and training
Featured Recommended AI Models
Š 2025AIbase