๐ Med42 - Clinical Large Language Model
Med42 is an open - access clinical large language model (LLM) developed by M42. It aims to expand access to medical knowledge. Built on LLaMA - 2 with 70 billion parameters, this generative AI system offers high - quality answers to medical questions.
๐ Quick Start
๐จ Update: Version 2 of Med42 Released! ๐จ
Please find the models here: [Med42 - v2 - 70B](https://huggingface.co/m42 - health/Llama3 - Med42 - 70B) and [Med42 - v2 - 8B](https://huggingface.co/m42 - health/Llama3 - Med42 - 8B)
Access Med42 on Hugging Face
This is a form to enable access to Med42 on Hugging Face. Please read the [Med42 License](https://huggingface.co/spaces/m42 - health/License) and accept our license terms and acceptable use policy before submitting this form. Requests will be processed by the M42 Team within 2 working days.
Submit
Required Information
Field |
Type |
Full name |
text |
Country |
text |
Affiliation |
text |
I certify the details provided above are correct and that I have read and agreed to the Med42 License agreement |
checkbox |
โจ Features
- Medical Knowledge Expansion: Med42 helps expand access to medical knowledge, providing high - quality answers to medical questions.
- Competitive Performance: Achieves competitive results on various medical benchmarks, such as MedQA, MedMCQA, and more.
๐ฆ Installation
To use Med42, you need to follow these steps:
- Read the [Med42 License](https://huggingface.co/spaces/m42 - health/License) and accept the license terms.
- Request access to download the model weights (and tokenizer).
๐ป Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "m42 - health/med42 - 70b"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
prompt = "What are the symptoms of diabetes ?"
prompt_template=f'''
<|system|>: You are a helpful medical assistant created by M42 Health in the UAE.
<|prompter|>:{prompt}
<|assistant|>:
'''
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True,eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, max_new_tokens=512)
print(tokenizer.decode(output[0]))
๐ Documentation
Model Details
Intended Use
Med42 is available for further testing and assessment as an AI assistant to enhance clinical decision - making and improve access to an LLM for healthcare use. Potential use cases include:
- Medical question answering
- Patient record summarization
- Aiding medical diagnosis
- General health Q&A
Hardware and Software
The training process was performed on the Condor Galaxy 1 (CG - 1) supercomputer platform.
Evaluation Results
Med42 achieves competitive performance on various medical benchmarks, including MedQA, MedMCQA, PubMedQA, HeadQA, and Measuring Massive Multitask Language Understanding (MMLU) clinical topics.
Dataset |
Med42 |
ClinicalCamel - 70B |
GPT - 3.5 |
GPT - 4.0 |
Med - PaLM - 2 (5 - shot)* |
MMLU Clinical Knowledge |
74.3 |
69.8 |
69.8 |
86.0 |
88.3 |
MMLU College Biology |
84.0 |
79.2 |
72.2 |
95.1 |
94.4 |
MMLU College Medicine |
68.8 |
67.0 |
61.3 |
76.9 |
80.9 |
MMLU Medical Genetics |
86.0 |
69.0 |
70.0 |
91.0 |
90.0 |
MMLU Professional Medicine |
79.8 |
71.3 |
70.2 |
93.0 |
95.2 |
MMLU Anatomy |
67.4 |
62.2 |
56.3 |
80.0 |
77.8 |
MedMCQA |
60.9 |
47.0 |
50.1 |
69.5 |
71.3 |
MedQA |
61.5 |
53.4 |
50.8 |
78.9 |
79.7 |
USMLE Self - Assessment |
71.7 |
- |
49.1 |
83.8 |
- |
USMLE Sample Exam |
72.0 |
54.3 |
56.9 |
84.3 |
- |
**We note that 0 - shot performance is not reported for Med - PaLM 2. Further details can be found at https://github.com/m42health/med42*.
Key performance metrics:
- Med42 achieves a 72% accuracy on the US Medical Licensing Examination (USMLE) sample exam, surpassing the prior state of the art among openly available medical LLMs.
- 61.5% on MedQA dataset (compared to 50.8% for GPT - 3.5)
- Consistently higher performance on MMLU clinical topics compared to GPT - 3.5.
Limitations & Safe Use
- Med42 is not ready for real clinical use. Extensive human evaluation is undergoing as it is required to ensure safety.
- Potential for generating incorrect or harmful information.
- Risk of perpetuating biases in training data.
โ ๏ธ Important Note
Use this model responsibly! Do not rely on it for medical usage without rigorous safety testing.
๐ License
The use of this model is governed by the M42 Health license. Please read the [Med42 License](https://huggingface.co/spaces/m42 - health/License).
๐ Accessing Med42 and Reporting Issues
๐ Citation
Our paper has been published at AAAI 2024 Spring Symposium - Clinical Foundation Models and is available on arXiv: https://arxiv.org/abs/2404.14779
@article{christophe2024med42,
title={Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches},
author={Clรฉment Christophe and Praveen K Kanithi and Prateek Munjal and Tathagata Raha and Nasir Hayat and Ronnie Rajan and Ahmed Al-Mahrooqi and Avani Gupta and Muhammad Umar Salman and Gurpreet Gosal and Bhargav Kanakiya and Charles Chen and Natalia Vassilieva and Boulbaba Ben Amor and Marco AF Pimentel and Shadab Khan},
year={2024},
eprint={2404.14779},
archivePrefix={arXiv},
primaryClass={cs.CL}
}