Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Aloe: A New Family of Healthcare LLMs
Aloe is a new family of healthcare LLMs. It competes well with previous open models in its range. By using model merging and advanced prompting strategies, it achieves state - of - the - art results at its size. It scores high in ethics and factuality metrics due to red teaming and alignment efforts. Complete training details, model merging configurations, and all training data will be shared, along with the prompting repository for inference. Aloe also comes with a healthcare - specific risk assessment for safe use and deployment.
🚀 Quick Start
Use the code below to get started with the model. You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the generate()
function. Let's see examples of both.
💻 Usage Examples
Basic Usage
import transformers
import torch
model_id = "HPAI-BSC/Llama3-Aloe-8B-Alpha"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
{"role": "user", "content": "Hello."},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
Advanced Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "HPAI-BSC/Llama3-Aloe-8B-Alpha"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are an expert medical assistant named Aloe, developed by the High Performance Artificial Intelligence Group at Barcelona Supercomputing Center(BSC). You are to be a helpful, respectful, and honest assistant."},
{"role": "user", "content": "Hello"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
✨ Features
- Latest Versions Available: The ALOE BETA 8B and ALOE BETA 70B versions offer better overall performance, more thorough alignment and safety, and a license compatible with more uses.
- High - Performance in Healthcare: Aloe is highly competitive with previous open models in its range and reaches state - of - the - art results at its size through model merging and advanced prompting strategies.
- Ethical and Factual: It scores high in metrics measuring ethics and factuality due to combined red teaming and alignment efforts.
- Transparent: Complete training details, model merging configurations, and all training data (including synthetically generated data) will be shared, along with the prompting repository for inference.
- Safe Use: Comes with a healthcare - specific risk assessment for safe use and deployment.
📦 Installation
No specific installation steps are provided in the original README.
📚 Documentation
Model Details
Model Description
Property | Details |
---|---|
Developed by | HPAI |
Model Type | Causal decoder - only transformer language model |
Language(s) (NLP) | English (mainly) |
License | This model is based on Meta Llama 3 8B and is governed by the Meta Llama 3 License. All modifications are available with a [CC BY - NC 4.0](https://creativecommons.org/licenses/by - nc/4.0/) license. |
Finetuned from model | [meta - llama/Meta - Llama - 3 - 8B](https://huggingface.co/meta - llama/Meta - Llama - 3 - 8B) |
Model Sources [optional]
- Repository: https://github.com/HPAI - BSC/prompt_engine (more coming soon)
- Paper: https://arxiv.org/abs/2405.01886 (more coming soon)
Model Performance
Aloe has been tested on popular healthcare QA datasets, with and without the medprompting inference technique. Results show competitive performance, even against bigger models. Results using advanced prompting methods (aka Medprompt) are achieved through a [repo](https://github.com/HPAI - BSC/prompt_engine) made public with this work.
Uses
Direct Use
We encourage the use of Aloe for research purposes, as a stepping stone to build better foundational models for healthcare.
Out - of - Scope Use
These models are not to be used for clinical practice, medical diagnosis, or any other form of direct or indirect healthcare advice. Models are prone to error and can produce toxic content. The use of Aloe models for activities harmful for individuals, such as spam, fraud, or impersonation, is prohibited.
Bias, Risks, and Limitations
We consider three risk cases:
- Healthcare professional impersonation: A model like Aloe could be used to increase the efficacy of such deceiving activities. Preventive actions include public literacy on the unreliability of digitised information and the importance of medical registration, and legislation enforcing AI - generated content disclaimers.
- Medical decision - making without professional supervision: Aloe can facilitate self - delusion and generate actionable answers. Public literacy on the dangers of self - diagnosis, along with disclaimers and warnings on the models' outputs, are main defences.
- Access to information on dangerous substances or procedures: LLMs can centralize access to such information. Model alignment can help, but jailbreaking methods still overcome it.
The table below shows the performance of Aloe at several AI safety tasks: 
Recommendations
We avoid the use of all personal data in our training. Model safety cannot be guaranteed. Aloe can produce toxic content under the appropriate prompts. Minors should not be left alone to interact with Aloe without supervision.
Training Details
Supervised fine - tuning on top of Llama 3 8B using medical and general domain datasets, model merging using DARE - TIES process, two - stage DPO process for human preference alignment. More details coming soon.
Training Data
- Medical domain datasets, including synthetic data generated using Mixtral - 8x7B and Genstruct
- HPAI - BSC/pubmedqa - cot
- HPAI - BSC/medqa - cot
- HPAI - BSC/medmcqa - cot
- LDJnr/Capybara
- hkust - nlp/deita - 10k - v0
- jondurbin/airoboros - 3.2
- argilla/dpo - mix - 7k
- nvidia/HelpSteer
- Custom preference data with adversarial prompts generated from Anthropic Harmless, Chen et al., and original prompts
Evaluation
Testing Data, Factors & Metrics
Testing Data
- MedQA (USMLE)
- MedMCQA
- PubMedQA
- MMLU - Medical
- [MedQA - 4 - Option](https://huggingface.co/datasets/GBaker/MedQA - USMLE - 4 - options)
- [CareQA](https://huggingface.co/datasets/HPAI - BSC/CareQA)
Metrics
- Accuracy: suite the evaluation of multiple - choice question - answering tasks.
Results

Summary
To compare Aloe with competitive open models, we use popular healthcare datasets and CareQA. We calculate the standard MultiMedQA score and the arithmetic mean across all datasets. The Medical MMLU is calculated by averaging six medical subtasks.
Benchmark results show that Aloe outperforms Llama3 - 8B - Instruct and larger models like Meditron 70B. With prompting techniques, especially Medprompting, the performance of Llama3 - Aloe - 8B - Alpha is significantly improved.
Environmental Impact
Property | Details |
---|---|
Hardware Type | 4xH100 |
Hours used | 7,000 |
Hardware Provider | Barcelona Supercomputing Center |
Compute Region | Spain |
Carbon Emitted | 439.25kg |
Model Card Authors
[Ashwin Kumar Gururajan](https://huggingface.co/G - AshwinKumar)
Model Card Contact
mailto:hpai@bsc.es
Citations
If you use this repository in a published work, please cite the following papers as source:
@misc{gururajan2024aloe,
title={Aloe: A Family of Fine - tuned Open Healthcare LLMs},
author={Ashwin Kumar Gururajan and Enrique Lopez - Cuena and Jordi Bayarri - Planas and Adrian Tormos and Daniel Hinjos and Pablo Bernabeu - Perez and Anna Arias - Duart and Pablo Agustin Martin - Torres and Lucia Urcelay - Ganzabal and Marta Gonzalez - Mallo and Sergio Alvarez - Napagao and Eduard Ayguadé - Parra and Ulises Cortés Dario Garcia - Gasulla},
year={2024},
eprint={2405.01886},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 License
This model is based on Meta Llama 3 8B and is governed by the Meta Llama 3 License. All our modifications are available with a [CC BY - NC 4.0](https://creativecommons.org/licenses/by - nc/4.0/) license.

