Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Model Card for Meditron-70B-v1.0
Meditron is a suite of open - source medical Large Language Models (LLMs). Meditron-70B is adapted from Llama-2-70B for the medical domain and outperforms several models on multiple medical reasoning tasks.
Advisory Notice
While Meditron is designed to encode medical knowledge from sources of high - quality evidence, it is not yet adapted to deliver this knowledge appropriately, safely, or within professional actionable constraints. We recommend against deploying Meditron in medical applications without extensive use - case alignment, as well as additional testing, specifically including randomized controlled trials in real - world practice settings.
🚀 Quick Start
Meditron-70B is now available for further testing and assessment. It can be used in various medical - related scenarios. For more interactive prompting, you can refer to our deployment guide which uses [FastChat](https://github.com/lm - sys/FastChat) with [vLLM](https://github.com/vllm - project/vllm).
✨ Features
- Medical - Domain Adapted: Adapted from Llama - 2 - 70B through continued pretraining on a curated medical corpus.
- High Performance: Outperforms Llama - 2 - 70B, GPT - 3.5 (
text - davinci - 003
, 8 - shot), and Flan - PaLM on multiple medical reasoning tasks. - Diverse Use Cases: Suitable for medical exam question answering, differential diagnosis support, disease and general health information query.
📦 Installation
No installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
It is possible to use this model to generate text for experimentation and understanding its capabilities. For example, you can use in - context learning with k demonstrations (3 or 5 as in the paper) added to the prompt.
Advanced Usage
For downstream question - answering tasks, you can finetune the model using specific training sets. You can also use a high - throughput and memory - efficient inference engine with a UI that supports chat and text generation.
📚 Documentation
Model Details
Property | Details |
---|---|
Developed by | [EPFL LLM Team](https://huggingface.co/epfl - llm) |
Model Type | Causal decoder - only transformer language model |
Language(s) | English (mainly) |
Model License | [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta - llama/Llama - 2 - 70b/raw/main/LICENSE.txt) |
Code License | APACHE 2.0 LICENSE |
Continue - pretrained from model | [Llama - 2 - 70B](https://huggingface.co/meta - llama/Llama - 2 - 70b) |
Context length | 4K tokens |
Input | Text - only data |
Output | Model generates text only |
Status | This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we enhance model's performance. |
Knowledge Cutoff | August 2023 |
Repository | epflLLM/meditron |
Trainer | [epflLLM/Megatron - LLM](https://github.com/epfLLM/Megatron - LLM) |
Paper | MediTron - 70B: Scaling Medical Pretraining for Large Language Models |
Uses
Direct Use
It can be used to generate text for experimentation, but not for production or work that may impact people.
Downstream Use
Meditron - 70B and Meditron - 7B are foundation models. They can be finetuned, instruction - tuned, or RLHF - tuned for specific downstream tasks. Two methods for downstream question - answering tasks are:
- Apply in - context learning with k demonstrations added to the prompt.
- Finetune the models using specific training sets.
Out - of - Scope Use
Do not use this model for natural language generation in a production environment, finetuned or otherwise.
Truthfulness, Helpfulness, Risk, and Bias
Truthfulness
An initial assessment of Meditron models' Truthfulness was done against baseline and consumer - level medical models using TruthfulQA (multiple choice). Only medical - relevant categories were focused on.
Category | meditron - 70b | llama - 2 - 70b | med42 - 70b* | meditron - 7b | llama - 2 - 7b | PMC - llama - 7b |
---|---|---|---|---|---|---|
Health | 81.8 | 69.1 | 83.6 | 27.3 | 16.4 | 3.6 |
Nutrition | 77.9 | 68.8 | 62.5 | 31.1 | 12.5 | 6.3 |
Psychology | 47.4 | 36.8 | 52.6 | 21.1 | 10.5 | 0.0 |
Science | 77.8 | 44.4 | 33.3 | 33.3 | 11.1 | 0.0 |
Avg | 71.2 | 54.8 | 58.0 | 28.3 | 12.6 | 2.5 |
For more details, see the paper.
Helpfulness, Risk, and Bias
A comprehensive qualitative generation report of Meditron - 70B on medical - expert - designed queries is provided in the paper, comparing with Llama - 2 - 70B and ChatGPT - 3.5 (version Nov, 27, 2023).
Recommendations
⚠️ Important Note
Users should be aware of the risks, biases, and limitations of the model. Do not use it in production for natural language generation or professional health - related purposes without comprehensive testing.
Training Details
Training Data
Meditron’s domain - adaptive pre - training corpus GAP - Replay combines 48.1B tokens from four corpora:
- [Clinical Guidelines](https://huggingface.co/datasets/epfl - llm/guidelines): A new dataset of 46K internationally - recognized clinical practice guidelines.
- Medical Paper Abstracts: 16.1M abstracts from closed - access PubMed and PubMed Central papers.
- Medical Papers: Full - text articles from 5M publicly available PubMed and PubMed Central papers.
- Replay Data: 400M tokens of general domain data from [RedPajama - v1](https://huggingface.co/datasets/togethercomputer/RedPajama - Data - 1T)

Training Procedure
The [Megatron - LLM](https://github.com/epfLLM/Megatron - LLM) library was used for training. The hardware consists of 16 nodes of 8x NVIDIA A100 (80GB) SXM GPUs. A three - way parallelism scheme was used:
- Data Parallelism (DP) of 2
- Pipeline Parallelism (PP) of 8
- Tensor Parallelism (TP) of 8
Training Hyperparameters
Parameter | Value |
---|---|
bf16 | true |
lr | 1.5e - 4 |
eps | 1e - 5 |
betas | [0.9, 0.95] |
clip_grad | 1 |
weight decay | 0.1 |
DP size | 2 |
TP size | 8 |
PP size | 8 |
seq length | 4096 |
lr scheduler | cosine |
min lr | 1e - 6 |
warmup iteration | 2000 |
micro batch size | 2 |
global batch size | 512 |
Speeds, Sizes, Times
The model was trained in September and October 2023.
Property | Value |
---|---|
Model size | 70B |
Hidden dimension | 8192 |
Num. attention heads | 64 |
Num. layers | 80 |
The 70B model was trained on 48e9 tokens at a throughput of about 40,200 tokens / second, with a bfloat16 model flops utilization of roughly 42.3%.
Evaluation
Testing Data & Metrics
Testing Data
- MedQA (USMLE)
- MedMCQA
- PubMedQA
- MMLU - Medical
- [MedQA - 4 - Option](https://huggingface.co/datasets/GBaker/MedQA - USMLE - 4 - options)
Metrics
- Accuracy: For multiple - choice question - answering tasks.
Results
Dataset | meditron - 70b | llama - 2 - 70b | med42 - 70b* | clinical - camel - 70b* |
---|---|---|---|---|
MMLU - Medical | 77.6 | 77.9 | 74.5 | 65.7 |
PubMedQA | 81.6 | 80.0 | 61.2 | 67.0 |
MedMCQA | 66.0 | 62.6 | 59.2 | 46.7 |
MedQA | 64.4 | 61.5 | 59.1 | 50.8 |
MedQA - 4 - Option | 70.2 | 63.8 | 63.9 | 56.8 |
Avg | 72.0 | 69.2 | 63.6 | 57.4 |
Note: Models with * are already instruction - tuned and excluded from further finetuning.
Environmental Impact
- Hardware Type: 128 x NVIDIA A100 (80GB) SXM
- Total GPU hours: 42,496
- Hardware P (The original document seems incomplete here)
📄 License
The model is under the [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta - llama/Llama - 2 - 70b/raw/main/LICENSE.txt), and the code is under APACHE 2.0 LICENSE.

