Meerkat-7B-v1.0 Open-source Medical AI Model - Exceeding the USMLE Passing Line and Offering Professional Medical Advice

Meerkat 7b V1.0

Developed by dmis-lab

Meerkat-7B-v1.0 is an instruction-tuned medical AI system that is the first among all 7B parameter models to surpass the 60% passing threshold of the United States Medical Licensing Examination (USMLE).

Large Language Model

Transformers

#USMLE Exam Breakthrough #Medical Chain-of-Thought Reasoning #Textbook Knowledge Distillation

Downloads 5,069

Release Time : 4/3/2024

Model Overview

This model is trained using a newly constructed synthetic dataset, which includes high-quality chain-of-thought reasoning paths from 18 medical textbooks and a diverse instruction-following dataset, equipping it with advanced medical reasoning capabilities to solve complex medical problems.

Model Features

Medical Reasoning Capability

Trained with synthetic datasets, it possesses advanced medical reasoning capabilities to solve complex medical problems.

USMLE Exam Performance

First among 7B parameter models to surpass the 60% passing threshold of the USMLE.

Instruction Tuning

Fine-tuned with diverse instruction-following datasets to enhance generalization capabilities.

Chain-of-Thought Reasoning

Includes high-quality chain-of-thought reasoning paths from 18 medical textbooks.

Model Capabilities

Medical Q&A

Clinical Case Analysis

Medical Multiple-Choice Question Answering

Multi-turn Medical Dialogue

Use Cases

Medical Education

USMLE Exam Preparation

Assists medical students in preparing for the USMLE by providing question answers and reasoning paths.

Achieved 70.6% accuracy on the MedQA test

Clinical Assistance

Case Analysis

Assists doctors in analyzing complex clinical cases and provides diagnostic suggestions.

Achieved 53.1% accuracy on the JAMA Clinical Challenge test

🚀 Meerkat-7B (Version 1.0)

Meerkat-7B-v1.0 is an instruction-tuned medical AI system. It surpasses the passing threshold of 60% for the USMLE among all 7B-parameter models, trained with high - quality synthetic data to solve complex medical problems.

🚀 Quick Start

The input query should always end with "ASSISTANT:" as shown below.

query = "USER: What should I do when I get cold? ASSISTANT:"

We can use our model using the apply_chat_template function as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"  # cuda or cpu
checkpoint = "dmis-lab/meerkat-7b-v1.0"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    torch_dtype=torch.bfloat16,  # You can choose to use this when there's not enough GPU memory available.
)

# Multi-turn dialogue example
messages = [
    {"role": "system", "content": "You are a helpful doctor or healthcare professional. Guide the conversation to provide useful, complete, and scientifically-grounded answers to user questions. You have the option to compose a concise, single-turn conversation if the user's input is comprehensive to provide accurate answers. However, if essential details are missing, you should engage in a multi-turn dialogue, asking follow-up questions to gather a thorough medical history and records.\n\n"},
    {"role": "user", "content": "Hello, doctor. I'm really concerned about my 10-year-old son. We recently discovered a painless mass in his left testicle, so we brought him to the pediatrician."},
    {"role": "assistant", "content": "I understand your concern. Let's gather some more information. Has your son experienced any other symptoms along with the mass?"},
    {"role": "user", "content": "Other than the mass, my son hasn't shown any symptoms. He's been his usual self, playing and eating normally."}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

✨ Features

Meerkat-7B-v1.0 is an instruction - tuned medical AI system that surpasses the 60% passing threshold of the USMLE among all 7B - parameter models for the first time.
It is trained with a new synthetic dataset containing high - quality chain - of - thought reasoning paths from 18 medical textbooks and diverse instruction - following datasets, equipping it with high - level medical reasoning capabilities.

💻 Usage Examples

Basic Usage

The input query should follow the format with "ASSISTANT:" at the end:

query = "USER: What should I do when I get cold? ASSISTANT:"

Advanced Usage

We can use the model with the apply_chat_template function for multi - turn dialogues:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"  # cuda or cpu
checkpoint = "dmis-lab/meerkat-7b-v1.0"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    torch_dtype=torch.bfloat16,  # You can choose to use this when there's not enough GPU memory available.
)

# Multi-turn dialogue example
messages = [
    {"role": "system", "content": "You are a helpful doctor or healthcare professional. Guide the conversation to provide useful, complete, and scientifically-grounded answers to user questions. You have the option to compose a concise, single-turn conversation if the user's input is comprehensive to provide accurate answers. However, if essential details are missing, you should engage in a multi-turn dialogue, asking follow-up questions to gather a thorough medical history and records.\n\n"},
    {"role": "user", "content": "Hello, doctor. I'm really concerned about my 10-year-old son. We recently discovered a painless mass in his left testicle, so we brought him to the pediatrician."},
    {"role": "assistant", "content": "I understand your concern. Let's gather some more information. Has your son experienced any other symptoms along with the mass?"},
    {"role": "user", "content": "Other than the mass, my son hasn't shown any symptoms. He's been his usual self, playing and eating normally."}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

📚 Documentation

Prompt Details

USMLE or Clinical Cases

When solving USMLE - style questions or dealing with complex clinical cases, use the following system message:

messages = [
    {"role": "system", "content": "The following is a multiple-choice question about medical knowledge. Solve this in a step-by-step fashion, starting by summarizing the available information. Output a single option from the given options as the final answer. You are strongly required to follow the specified output format; conclude your response with the phrase \"the answer is ([option_id]) [answer_string]\".\n\n"},
    {"role": "user", "content": "Two weeks after undergoing an emergency cardiac catherization with stenting for unstable angina pectoris, a 61-year-old man has decreased urinary output and malaise. He has type 2 diabetes mellitus and osteoarthritis of the hips. Prior to admission, his medications were insulin and naproxen. He was also started on aspirin, clopidogrel, and metoprolol after the coronary intervention. His temperature is 38\u00b0C (100.4\u00b0F), pulse is 93/min, and blood pressure is 125/85 mm Hg. Examination shows mottled, reticulated purplish discoloration of the feet. Laboratory studies show:\nHemoglobin count 14 g/dL\nLeukocyte count 16,400/mm3\nSegmented neutrophils 56%\nEosinophils 11%\nLymphocytes 31%\nMonocytes 2%\nPlatelet count 260,000/mm3\nErythrocyte sedimentation rate 68 mm/h\nSerum\nUrea nitrogen 25 mg/dL\nCreatinine 4.2 mg/dL\nRenal biopsy shows intravascular spindle-shaped vacuoles. Which of the following is the most likely cause of this patient's symptoms?\" (A) Renal papillary necrosis (B) Cholesterol embolization (C) Eosinophilic granulomatosis with polyangiitis (D) Polyarteritis nodosa"},
]

Multiple - choice Exams

For other multiple - choice exams, use the following simple system message:

messages = [
    {"role": "system", "content": "Answer the multiple-choice question about medical knowledge.\n\n"},
    {"role": "user", "content": "In a Robertsonian translocation fusion occurs at the: (A) telomeres. (B) centromeres. (C) histones. (D) ends of the long arms."},
]

Other Use Cases

Our model was trained using the AlpaCare instruction dataset with 52K examples to enhance its generalization capabilities. You can design and test your own prompts and share your thoughts with us.

🔧 Technical Details

Model Architecture

Our model is based on Mistral-7B-v0.1 due to its accuracy and run - time efficiency.

Training Data

Our data is available at this repository.

Evaluation

We tested the models on seven medical benchmarks: MedQA, USMLE sample test, Medbullets-4, Medbullets-5, MedMCQA, MMLU-Medical, and JAMA Clinical Challenge.

Model	Average	MedQA	USMLE	Medbullets-4	Medbullets-5	MedMCQA	MMLU-Medical	JAMA
GPT-4	75.2	81.4	86.6	68.8	63.3	72.4	87.1	67.1
GPT-3.5	54.1	53.6	58.5	51.0	47.4	51.0	67.3	50.1
MediTron-70B (Ensemble, 5 runs)	-	70.2	-	-	-	66.0	78.0	-
Open-source (7B)
MediTron-7B	50.8	50.2	44.6	51.1	45.5	57.9	56.7	49.3
BioMistral-7B	54.4	54.3	51.4	52.3	48.7	61.1	64.6	48.6
Meerkat-7B	62.4	70.6	70.3	58.7	52.9	60.6	70.5	53.1
Meerkat-7B (Ensemble, 5 runs)	64.2	74.3	71.4	61.0	55.3	60.7	72.4	54.0

The scores in MMLU - Medical were calculated based on the average accuracies across six medical - related subjects in the original MMLU benchmark, and each result for a single subject is presented below.

Model	Average	Cliniq Knowledge	Medical Genetics	Anatomy	Professional Medicine	College Biology	College Medicine
GPT-4	87.1	86.4	92.0	80.0	93.8	93.8	76.3
GPT-3.5	67.3	68.7	68.0	60.7	69.9	72.9	63.6
MediTron-70B (Ensemble, 5 runs)	78.0	75.5	85.9	69.4	82.3	86.7	68.0
Open-source (7B)
MediTron-7B	56.7	57.7	63.8	56.9	56.0	57.1	48.9
BioMistral-7B	64.6	59.9	64.0	56.5	60.4	59.0	54.7
Meerkat-7B	70.5	71.6	74.8	63.2	77.3	70.8	65.2
Meerkat-7B (Ensemble, 5 runs)	72.4	74.1	79.4	64.1	78.8	75.8	62.4

📄 License

This project is licensed under the CC - BY - NC - 4.0 license.

📖 Reference

Please use the following BibTeX entry to cite our paper:

@article{kim2025small,
  title={Small language models learn enhanced reasoning skills from medical textbooks},
  author={Kim, Hyunjae and Hwang, Hyeon and Lee, Jiwoo and Park, Sihyeon and Kim, Dain and Lee, Taewhoo and Yoon, Chanwoong and Sohn, Jiwoong and Park, Jungwoo and Reykhart, Olga and Fetherston, Thomas and Choi, Donghee and Kwak, Soo Heon and Chen, Qingyu and Kang, Jaewoo},
  journal={npj Digital Medicine},
  volume={8},
  number={1},
  pages={240},
  year={2025},
  publisher={Nature Publishing Group UK London}
}

📞 Contact

If you have any questions, feel free to email hyunjae-kim@korea.ac.kr and hyunjae.kim@yale.edu.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご