Qwen2.5-Aloe-Beta-7B Open-source Medical Large Model - Free Deployment Achieves Advanced Level in Multiple Medical Tasks

Qwen2.5 Aloe Beta 7B

Developed by HPAI-BSC

Qwen2.5-Aloe-Beta-7B is an open-source large medical language model that achieves state-of-the-art performance in multiple medical tasks. It is fine-tuned based on the Qwen2.5-7B architecture, and the training data covers 1.8 billion tokens of diverse medical tasks.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Medical Q&A #Multitasking #RAG Enhancement

Downloads 631

Release Time : 12/9/2024

Model Overview

Aloe-Beta is the latest iteration of the Aloe series, designed specifically for the medical field. It supports various medical tasks such as text summarization, diagnosis explanation, and treatment recommendation, while maintaining capabilities in the general domain.

Model Features

Specialization in the Medical Field

Trained on 20 medical tasks to form a robust and versatile medical model with performance reaching the best level among similar models.

Retention of General Capabilities

Introduce 20% general data to avoid catastrophic forgetting and maintain cross-domain capabilities such as mathematics and programming.

Enhanced Security Alignment

Include medical preference datasets and red team test data to enhance the model's security and alignment.

Optimized for RAG Compatibility

Optimized specifically for retrieval-augmented generation systems, and the performance is close to that of top-tier closed-source models when combined with RAG.

Model Capabilities

Medical Q&A

Medical Text Summarization

Diagnostic Reasoning

Treatment Recommendation Generation

Medical Text Classification

Medical Explanation

General Text Generation

Code Generation

Mathematical Reasoning

Use Cases

Medical Assistance

Medical Knowledge Q&A

Answer medical-related questions from doctors or patients and provide professional explanations.

Reach the SOTA level in benchmarks such as MedQA.

Medical Literature Summarization

Automatically generate concise summaries of medical research literature.

Support long text processing (>8k tokens)

Medical Education

Medical Concept Explanation

Provide popular explanations of complex medical concepts for medical students.

🚀 Aloe: A Family of Fine-tuned Open Healthcare LLMs

Aloe is a family of fine - tuned open healthcare LLMs. These models achieve state - of - the - art performance on several medical tasks. They are available in multiple sizes and trained on diverse medical data, making them robust and versatile for healthcare applications.

🚀 Quick Start

You can start using the Aloe model with the following code examples. There are two ways: using the Transformers pipeline and the AutoModelForCausalLM class.

💻 Usage Examples

Basic Usage (Transformers pipeline)

import transformers
import torch

model_id = "HPAI-BSC/Qwen2.5-Aloe-Beta-7B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
    {"role": "user", "content": "Hello."},
]

prompt = pipeline.tokenizer.apply_chat_template(
		messages, 
		tokenize=False, 
		add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|im_end|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1.05
)
print(outputs[0]["generated_text"][len(prompt):])

Advanced Usage (Transformers AutoModelForCausalLM)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "HPAI-BSC/Qwen2.5-Aloe-Beta-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
    {"role": "user", "content": "Hello"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|im_end|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.7,
    top_p=0.8,
    top_k=20,
    repetition_penalty=1.05
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

✨ Features

Multiple Sizes: Aloe is available in four model sizes: 7B, 8B, 70B, and 72B.
Diverse Training: Trained on 20 medical tasks, resulting in a robust and versatile healthcare model.
High Performance: Evaluations show Aloe models to be among the best in their class. When combined with a RAG system, the 7B and 8B versions get close to the performance of closed models, and the 70B and 72B versions outperform them.

📦 Installation

No specific installation steps are provided in the original README. If you want to use the Aloe model, you can follow the general steps of using the transformers library:

pip install transformers torch

📚 Documentation

Model Details

Model Description

Property	Details
Developed by	HPAI
Model Type	Causal decoder - only transformer language model
Language(s) (NLP)	English (capable but not formally evaluated on other languages)
License	This model is based on Qwen2.5-7B which is released with Apache 2.0 license. All modifications are available with a CC BY 4.0 license, making the Aloe Beta models compatible with commercial use.
Base model	Qwen2.5-7B
Paper	(more coming soon)
RAG Repository	https://github.com/HPAI-BSC/prompt_engine

Model Performance

Aloe Beta has been tested on the most popular healthcare QA datasets, with and without Medprompt inference technique. Results show competitive performance, achieving SOTA within models of the same size. It has been evaluated in many different medical tasks and also compared in the general domain using the OpenLLM Leaderboard benchmark, showing good results.

Uses

Direct Use

We encourage the use of Aloe for research purposes, as a stepping stone to build better foundational models for healthcare. In production, Aloe should always be used under the supervision of a human expert.

Out - of - Scope Use

These models are not to be used for clinical practice, medical diagnosis, or any other form of direct or indirect healthcare advice. Models are prone to error and can produce toxic content. The use of Aloe models for activities harmful to individuals, such as spam, fraud, or impersonation, is strictly prohibited. Minors should not be left alone to interact with Aloe without supervision.

Bias, Risks, and Limitations

Aloe can produce toxic content under the appropriate prompts and includes multiple undesirable biases. Although efforts have been made to mitigate this, model safety cannot be fully guaranteed. We avoid using all personal data in training.

We identify at least three risk cases specific to healthcare LLMs:

Healthcare professional impersonation: Aloe could be used to increase the efficacy of such deceiving activities. Preventive actions include public literacy and legislation.
Medical decision - making without professional supervision: Aloe can facilitate self - delusion. Public literacy on self - diagnosis dangers and disclaimers are important defenses.
Access to information on dangerous substances or procedures: LLMs can centralize access to sensitive information. Model alignment helps but is insufficient due to jailbreaking methods.

Training Details

Supervised fine - tuning

SFT on top of Qwen2.5 - 7B using axolotl (https://github.com/axolotl - ai - cloud/axolotl). Hardware used for different model sizes:

7B: 32x NVIDIA Hopper H100 64GB of the Marenostrum 5.
8B: 32x NVIDIA Hopper H100 64GB of the Marenostrum 5.
70B: 64x NVIDIA Hopper H100 64GB of the Marenostrum 5.
72B: 92x NVIDIA Hopper H100 64GB of the Marenostrum 5.

Training Data

The training set consists of around 1.8B tokens, having 3 different types of data:

Medical domain datasets: Includes data from 20 different medical tasks, such as [HPAI - BSC/Aloe - Beta - General - Collection](https://huggingface.co/datasets/HPAI - BSC/Aloe - Beta - General - Collection), [HPAI - BSC/chain - of - diagnosis](https://huggingface.co/datasets/HPAI - BSC/chain - of - diagnosis), etc.
Synthetic data: Generated high - quality answers using Llama3.1 - 70B, including [HPAI - BSC/pubmedqa - cot - llama31](https://huggingface.co/datasets/HPAI - BSC/pubmedqa - cot - llama31), etc.
General data: It includes maths, STEM, code, function calling, and instructions with a very long context, like [HPAI - BSC/Aloe - Beta - General - Collection](https://huggingface.co/datasets/HPAI - BSC/Aloe - Beta - General - Collection).

Training parameters

Epochs: 3
Sequence length: 16384
Optimizer: adamw_torch
Learning rate: 1e - 5
Learning rate scheduler: cosine
Warmup steps: 100
Weight decay: 0
Gradient checkpointing
Zero 3
Total batch size: 128
Batch size per device: 1
Gradient accumulation steps: 4

Model Merging

The model trained was merged with the Qwen2.5 - 7B - Instruct model using the DARE_TIES technique. [Mergekit](https://github.com/arcee - ai/mergekit) was used to conduct the merging.

Model Alignment

The model is aligned using the Direct Preference Optimization (DPO) technique through a two - step process:

General DPO Alignment: Uses a dataset combining medical, general preference, and safety data. We used [HPAI - BSC/Aloe - Beta - DPO](https://huggingface.co/datasets/HPAI - BSC/Aloe - Beta - DPO). Trained iteratively for one epoch on each chunk with a learning rate of 2e - 7.
Red - Teaming Alignment: Further fine - tunes the model to resist attacks. Dataset will be shared soon. Learning rate is set to 1e - 7.

We used OpenRLHF library and aligned the model using 16x NVIDA HOOPER H100 64GB of the Marenostrum 5. Common hyperparameters:

Sequence length: 4096
Optimizer: Fused adam
Total batch size 128
Batch size per device: 1
Gradient accumulation steps: 8
Beta: 0.1

Evaluation

Testing Data, Factors & Metrics

Testing Data

[ACI - BENCH](https://github.com/wyim/aci - bench)
[MTS - Dialog](https://github.com/abachaa/MTS - Dialog)
MedText
[Medical Text classification](https://www.kaggle.com/datasets/chaitanyakck/medical - text/data)
[OLAPH](https://github.com/dmis - lab/OLAPH)
CareQA Open
MedDialog
MEDIQA QA
Meddialog Qsumm
Biored
[MIMIC - III](https://huggingface.co/datasets/dmacres/mimiciii - hospitalcourse - meta)
[Medical Prescription](https://huggingface.co/datasets/devlocalhost/prescription - full)
MedQA (USMLE)
MedMCQA
PubMedQA
MMLU - Medical
[MedQA - 4 - Option](https://huggingface.co/datasets/GBaker/MedQA - USMLE - 4 - options)
[CareQA](https://huggingface.co/datasets/HPAI - BSC/CareQA)
[Open LLM Leaderboard 2](https://huggingface.co/spaces/open - llm - leaderboard/open_llm_leaderboard)

Metrics

Accuracy: suite the evaluation of multiple - choice question - answering tasks.
Rouge1: refers to the overlap of unigrams between the system and the gold standard.

Summary

Benchmark results indicate that the training of Aloe has boosted its performance above all other open models within the same model size. With the help of prompting techniques, the performance of Qwen2.5 - Aloe - Beta - 7B is significantly improved.

📄 License

This model is based on Qwen2.5-7B which is released with Apache 2.0 license. All modifications are available with a CC BY 4.0 license, making the Aloe Beta models compatible with commercial use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご