Meditron-70B Open-source Large Language Model - Free Support for Medical Knowledge Encoding and Inference

Meditron 70b

Developed by epfl-llm

Meditron-70B is an open-source large language model based on Llama-2-70B through continuous pre-training in the medical field, focusing on medical knowledge encoding and reasoning tasks.

Large Language Model

Transformers

English#Medical Q&A #Clinical Guideline Adaptation #Medical Exam Assistance

Downloads 214

Release Time : 11/8/2023

Model Overview

A large language model with 70 billion parameters in the medical field, trained on carefully selected medical corpora, outperforming similar models in multiple medical reasoning tasks.

Model Features

Medical Domain Adaptation

Continuous pre-training based on 48.1B tokens of medical professional corpora (clinical guidelines/medical papers)

High Performance

Achieves 71.2% accuracy on the TruthfulQA medical category, surpassing Llama-2-70B (54.8%) and Med42-70B (58.0%)

Long Context Support

4K tokens context window, suitable for processing complex medical documents

Model Capabilities

Medical Q&A generation

Clinical guideline parsing

Medical literature comprehension

Diagnostic assistance reasoning

Use Cases

Clinical Decision Support

Medical Exam Q&A

Answering questions for medical qualification exams such as USMLE

64.4% accuracy on the MedQA test set

Assisted Differential Diagnosis

Generating possible diagnostic suggestions based on symptom descriptions

Medical Information Retrieval

Disease Information Retrieval

Querying medical knowledge such as symptoms/causes/treatment plans

🚀 Model Card for Meditron-70B-v1.0

Meditron is a suite of open - source medical Large Language Models (LLMs). Meditron-70B is adapted from Llama-2-70B for the medical domain and outperforms several models on multiple medical reasoning tasks.

Meditron-logo

Advisory Notice

While Meditron is designed to encode medical knowledge from sources of high - quality evidence, it is not yet adapted to deliver this knowledge appropriately, safely, or within professional actionable constraints. We recommend against deploying Meditron in medical applications without extensive use - case alignment, as well as additional testing, specifically including randomized controlled trials in real - world practice settings.

🚀 Quick Start

Meditron-70B is now available for further testing and assessment. It can be used in various medical - related scenarios. For more interactive prompting, you can refer to our deployment guide which uses [FastChat](https://github.com/lm - sys/FastChat) with [vLLM](https://github.com/vllm - project/vllm).

✨ Features

Medical - Domain Adapted: Adapted from Llama - 2 - 70B through continued pretraining on a curated medical corpus.
High Performance: Outperforms Llama - 2 - 70B, GPT - 3.5 (text - davinci - 003, 8 - shot), and Flan - PaLM on multiple medical reasoning tasks.
Diverse Use Cases: Suitable for medical exam question answering, differential diagnosis support, disease and general health information query.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

It is possible to use this model to generate text for experimentation and understanding its capabilities. For example, you can use in - context learning with k demonstrations (3 or 5 as in the paper) added to the prompt.

Advanced Usage

For downstream question - answering tasks, you can finetune the model using specific training sets. You can also use a high - throughput and memory - efficient inference engine with a UI that supports chat and text generation.

Qualitative Analysis Prompt

📚 Documentation

Model Details

Property	Details
Developed by	[EPFL LLM Team](https://huggingface.co/epfl - llm)
Model Type	Causal decoder - only transformer language model
Language(s)	English (mainly)
Model License	[LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta - llama/Llama - 2 - 70b/raw/main/LICENSE.txt)
Code License	APACHE 2.0 LICENSE
Continue - pretrained from model	[Llama - 2 - 70B](https://huggingface.co/meta - llama/Llama - 2 - 70b)
Context length	4K tokens
Input	Text - only data
Output	Model generates text only
Status	This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we enhance model's performance.
Knowledge Cutoff	August 2023
Repository	epflLLM/meditron
Trainer	[epflLLM/Megatron - LLM](https://github.com/epfLLM/Megatron - LLM)
Paper	MediTron - 70B: Scaling Medical Pretraining for Large Language Models

Uses

Direct Use

It can be used to generate text for experimentation, but not for production or work that may impact people.

Downstream Use

Meditron - 70B and Meditron - 7B are foundation models. They can be finetuned, instruction - tuned, or RLHF - tuned for specific downstream tasks. Two methods for downstream question - answering tasks are:

Apply in - context learning with k demonstrations added to the prompt.
Finetune the models using specific training sets.

Out - of - Scope Use

Do not use this model for natural language generation in a production environment, finetuned or otherwise.

Truthfulness, Helpfulness, Risk, and Bias

Truthfulness

An initial assessment of Meditron models' Truthfulness was done against baseline and consumer - level medical models using TruthfulQA (multiple choice). Only medical - relevant categories were focused on.

Category	meditron - 70b	llama - 2 - 70b	med42 - 70b*	meditron - 7b	llama - 2 - 7b	PMC - llama - 7b
Health	81.8	69.1	83.6	27.3	16.4	3.6
Nutrition	77.9	68.8	62.5	31.1	12.5	6.3
Psychology	47.4	36.8	52.6	21.1	10.5	0.0
Science	77.8	44.4	33.3	33.3	11.1	0.0
Avg	71.2	54.8	58.0	28.3	12.6	2.5

For more details, see the paper.

Helpfulness, Risk, and Bias

A comprehensive qualitative generation report of Meditron - 70B on medical - expert - designed queries is provided in the paper, comparing with Llama - 2 - 70B and ChatGPT - 3.5 (version Nov, 27, 2023).

Recommendations

⚠️ Important Note

Users should be aware of the risks, biases, and limitations of the model. Do not use it in production for natural language generation or professional health - related purposes without comprehensive testing.

Training Details

Training Data

Meditron’s domain - adaptive pre - training corpus GAP - Replay combines 48.1B tokens from four corpora:

[Clinical Guidelines](https://huggingface.co/datasets/epfl - llm/guidelines): A new dataset of 46K internationally - recognized clinical practice guidelines.
Medical Paper Abstracts: 16.1M abstracts from closed - access PubMed and PubMed Central papers.
Medical Papers: Full - text articles from 5M publicly available PubMed and PubMed Central papers.
Replay Data: 400M tokens of general domain data from [RedPajama - v1](https://huggingface.co/datasets/togethercomputer/RedPajama - Data - 1T)

![Meditron-logo](gap - replay.png)

Training Procedure

The [Megatron - LLM](https://github.com/epfLLM/Megatron - LLM) library was used for training. The hardware consists of 16 nodes of 8x NVIDIA A100 (80GB) SXM GPUs. A three - way parallelism scheme was used:

Data Parallelism (DP) of 2
Pipeline Parallelism (PP) of 8
Tensor Parallelism (TP) of 8

Training Hyperparameters

Parameter	Value
bf16	true
lr	1.5e - 4
eps	1e - 5
betas	[0.9, 0.95]
clip_grad	1
weight decay	0.1
DP size	2
TP size	8
PP size	8
seq length	4096
lr scheduler	cosine
min lr	1e - 6
warmup iteration	2000
micro batch size	2
global batch size	512

Speeds, Sizes, Times

The model was trained in September and October 2023.

Property	Value
Model size	70B
Hidden dimension	8192
Num. attention heads	64
Num. layers	80

The 70B model was trained on 48e9 tokens at a throughput of about 40,200 tokens / second, with a bfloat16 model flops utilization of roughly 42.3%.

Evaluation

Testing Data & Metrics

Testing Data

MedQA (USMLE)
MedMCQA
PubMedQA
MMLU - Medical
[MedQA - 4 - Option](https://huggingface.co/datasets/GBaker/MedQA - USMLE - 4 - options)

Metrics

Accuracy: For multiple - choice question - answering tasks.

Results

Dataset	meditron - 70b	llama - 2 - 70b	med42 - 70b*	clinical - camel - 70b*
MMLU - Medical	77.6	77.9	74.5	65.7
PubMedQA	81.6	80.0	61.2	67.0
MedMCQA	66.0	62.6	59.2	46.7
MedQA	64.4	61.5	59.1	50.8
MedQA - 4 - Option	70.2	63.8	63.9	56.8
Avg	72.0	69.2	63.6	57.4

Note: Models with * are already instruction - tuned and excluded from further finetuning.

Environmental Impact

Hardware Type: 128 x NVIDIA A100 (80GB) SXM
Total GPU hours: 42,496
Hardware P (The original document seems incomplete here)

📄 License

The model is under the [LLAMA 2 COMMUNITY LICENSE AGREEMENT](https://huggingface.co/meta - llama/Llama - 2 - 70b/raw/main/LICENSE.txt), and the code is under APACHE 2.0 LICENSE.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご