đ Lingshu - SOTA Multimodal Large Language Models for Medical Domain
Lingshu is a state-of-the-art multimodal large language model designed for the medical domain. It excels in medical VQA tasks and report generation, offering high - performance solutions for unified multimodal medical understanding and reasoning.
Website
đ¤ 7B Model
đ¤ 32B Model
MedEvalKit
Technical Report
â ī¸ Important Note
We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre - trained language models, and despite our best efforts in red teaming and safety fine - tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation.
Developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations.
In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.
⨠Features
- Lingshu models achieve SOTA on most medical multimodal/textual QA and report generation tasks for 7B and 32 model sizes.
- Lingshu-32B outperforms GPT-4.1 and Claude Sonnet 4 in most multimodal QA and report generation tasks.
- Lingshu supports more than 12 medical imaging modalities, including X-Ray, CT Scan, MRI, Microscopy, Ultrasound, Histopathology, Dermoscopy, Fundus, OCT, Digital Photography, Endoscopy, and PET.
đĻ Release
đ Documentation
This repository contains the model of the paper Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning. We also release a comprehensive medical evaluation toolkit in MedEvalKit, which supports fast evaluation of major multimodal and textual medical tasks.
đ§ Evaluation
Medical Multimodal VQA
Models |
MMMU - Med |
VQA - RAD |
SLAKE |
PathVQA |
PMC - VQA |
OmniMedVQA |
MedXpertQA |
Avg. |
Proprietary Models |
|
|
|
|
|
|
|
|
GPT - 4.1 |
75.2 |
65.0 |
72.2 |
55.5 |
55.2 |
75.5 |
45.2 |
63.4 |
Claude Sonnet 4 |
74.6 |
67.6 |
70.6 |
54.2 |
54.4 |
65.5 |
43.3 |
61.5 |
Gemini - 2.5 - Flash |
76.9 |
68.5 |
75.8 |
55.4 |
55.4 |
71.0 |
52.8 |
65.1 |
Open - source Models (<10B) |
|
|
|
|
|
|
|
|
BiomedGPT |
24.9 |
16.6 |
13.6 |
11.3 |
27.6 |
27.9 |
- |
- |
Med - R1 - 2B |
34.8 |
39.0 |
54.5 |
15.3 |
47.4 |
- |
21.1 |
- |
MedVLM - R1 - 2B |
35.2 |
48.6 |
56.0 |
32.5 |
47.6 |
77.7 |
20.4 |
45.4 |
MedGemma - 4B - IT |
43.7 |
72.5 |
76.4 |
48.8 |
49.9 |
69.8 |
22.3 |
54.8 |
LLaVA - Med - 7B |
29.3 |
53.7 |
48.0 |
38.8 |
30.5 |
44.3 |
20.3 |
37.8 |
HuatuoGPT - V - 7B |
47.3 |
67.0 |
67.8 |
48.0 |
53.3 |
74.2 |
21.6 |
54.2 |
BioMediX2 - 8B |
39.8 |
49.2 |
57.7 |
37.0 |
43.5 |
63.3 |
21.8 |
44.6 |
Qwen2.5VL - 7B |
50.6 |
64.5 |
67.2 |
44.1 |
51.9 |
63.6 |
22.3 |
52.0 |
InternVL2.5 - 8B |
53.5 |
59.4 |
69.0 |
42.1 |
51.3 |
81.3 |
21.7 |
54.0 |
InternVL3 - 8B |
59.2 |
65.4 |
72.8 |
48.6 |
53.8 |
79.1 |
22.4 |
57.3 |
Lingshu - 7B |
54.0 |
67.9 |
83.1 |
61.9 |
56.3 |
82.9 |
26.7 |
61.8 |
Open - source Models (>10B) |
|
|
|
|
|
|
|
|
HealthGPT - 14B |
49.6 |
65.0 |
66.1 |
56.7 |
56.4 |
75.2 |
24.7 |
56.2 |
HuatuoGPT - V - 34B |
51.8 |
61.4 |
69.5 |
44.4 |
56.6 |
74.0 |
22.1 |
54.3 |
MedDr - 40B |
49.3 |
65.2 |
66.4 |
53.5 |
13.9 |
64.3 |
- |
- |
InternVL3 - 14B |
63.1 |
66.3 |
72.8 |
48.0 |
54.1 |
78.9 |
23.1 |
58.0 |
Qwen2.5V - 32B |
59.6 |
71.8 |
71.2 |
41.9 |
54.5 |
68.2 |
25.2 |
56.1 |
InternVL2.5 - 38B |
61.6 |
61.4 |
70.3 |
46.9 |
57.2 |
79.9 |
24.4 |
57.4 |
InternVL3 - 38B |
65.2 |
65.4 |
72.7 |
51.0 |
56.6 |
79.8 |
25.2 |
59.4 |
Lingshu - 32B |
62.3 |
76.5 |
89.2 |
65.9 |
57.9 |
83.4 |
30.9 |
66.6 |
Medical Textual QA
Models |
MMLU - Med |
PubMedQA |
MedMCQA |
MedQA |
Medbullets |
MedXpertQA |
SuperGPQA - Med |
Avg. |
Proprietary Models |
|
|
|
|
|
|
|
|
GPT - 4.1 |
89.6 |
75.6 |
77.7 |
89.1 |
77.0 |
30.9 |
49.9 |
70.0 |
Claude Sonnet 4 |
91.3 |
78.6 |
79.3 |
92.1 |
80.2 |
33.6 |
56.3 |
73.1 |
Gemini - 2.5 - Flash |
84.2 |
73.8 |
73.6 |
91.2 |
77.6 |
35.6 |
53.3 |
69.9 |
Open - source Models (<10B) |
|
|
|
|
|
|
|
|
Med - R1 - 2B |
51.5 |
66.2 |
39.1 |
39.9 |
33.6 |
11.2 |
17.9 |
37.0 |
MedVLM - R1 - 2B |
51.8 |
66.4 |
39.7 |
42.3 |
33.8 |
11.8 |
19.1 |
37.8 |
MedGemma - 4B - IT |
66.7 |
72.2 |
52.2 |
56.2 |
45.6 |
12.8 |
21.6 |
46.8 |
LLaVA - Med - 7B |
50.6 |
26.4 |
39.4 |
42.0 |
34.4 |
9.9 |
16.1 |
31.3 |
HuatuoGPT - V - 7B |
69.3 |
72.8 |
51.2 |
52.9 |
40.9 |
10.1 |
21.9 |
45.6 |
BioMediX2 - 8B |
68.6 |
75.2 |
52.9 |
58.9 |
45.9 |
13.4 |
25.2 |
48.6 |
Qwen2.5VL - 7B |
73.4 |
76.4 |
52.6 |
57.3 |
42.1 |
12.8 |
26.3 |
48.7 |
InternVL2.5 - 8B |
74.2 |
76.4 |
52.4 |
53.7 |
42.4 |
11.6 |
26.1 |
48.1 |
InternVL3 - 8B |
77.5 |
75.4 |
57.7 |
62.1 |
48.5 |
13.1 |
31.2 |
52.2 |
Lingshu - 7B |
74.5 |
76.6 |
55.9 |
63.3 |
56.2 |
16.5 |
26.3 |
52.8 |
Open - source Models (>10B) |
|
|
|
|
|
|
|
|
HealthGPT - 14B |
80.2 |
68.0 |
63.4 |
66.2 |
39.8 |
11.3 |
25.7 |
50.7 |
HuatuoGPT - V - 34B |
74.7 |
72.2 |
54.7 |
58.8 |
42.7 |
11.4 |
26.5 |
48.7 |
MedDr - 40B |
65.2 |
77.4 |
38.4 |
59.2 |
44.3 |
12.0 |
24.0 |
45.8 |
InternVL3 - 14B |
81.7 |
77.2 |
62.0 |
70.1 |
49.5 |
14.1 |
37.9 |
56.1 |
Qwen2.5VL - 32B |
83.2 |
68.4 |
63.0 |
71.6 |
54.2 |
15.6 |
37.6 |
56.2 |
InternVL2.5 - 38B |
84.6 |
74.2 |
65.9 |
74.4 |
55.0 |
14.7 |
39.9 |
58.4 |
InternVL3 - 38B |
83.8 |
73.2 |
64.9 |
73.5 |
54.6 |
16.0 |
42.5 |
58.4 |
Lingshu - 32B |
84.7 |
77.8 |
66.1 |
74.7 |
65.4 |
22.7 |
41.1 |
61.8 |
Medical Report Generation
Models |
MIMIC - CXR (ROUGE - L) |
MIMIC - CXR (CIDEr) |
MIMIC - CXR (RaTE) |
MIMIC - CXR (SembScore) |
MIMIC - CXR (RadCliQ - v1-1) |
CheXpert Plus (ROUGE - L) |
CheXpert Plus (CIDEr) |
CheXpert Plus (RaTE) |
CheXpert Plus (SembScore) |
CheXpert Plus (RadCliQ - v1-1) |
IU - Xray (ROUGE - L) |
IU - Xray (CIDEr) |
IU - Xray (RaTE) |
IU - Xray (SembScore) |
IU - Xray (RadCliQ - v1-1) |
Proprietary Models |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GPT - 4.1 |
9.0 |
82.8 |
51.3 |
23.9 |
57.1 |
24.5 |
78.8 |
45.5 |
23.2 |
45.5 |
30.2 |
124.6 |
51.3 |
47.5 |
80.3 |
Claude Sonnet 4 |
20.0 |
56.6 |
45.6 |
19.7 |
53.4 |
22.0 |
59.5 |
43.5 |
18.9 |
43.3 |
25.4 |
88.3 |
55.4 |
41.0 |
72.1 |
Gemini - 2.5 - Flash |
25.4 |
80.7 |
50.3 |
29.7 |
59.4 |
23.6 |
72.2 |
44.3 |
27.4 |
44.0 |
33.5 |
129.3 |
55.6 |
50.9 |
91.6 |
Open - source Models (<10B) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Med - R1 - 2B |
19.3 |
35.4 |
40.6 |
14.8 |
42.4 |
18.6 |
37.1 |
38.5 |
17.8 |
37.6 |
16.1 |
38.3 |
41.4 |
12.5 |
43.6 |
MedVLM - R1 - 2B |
20.3 |
40.1 |
41.6 |
14.2 |
48.3 |
20.9 |
43.5 |
38.9 |
15.5 |
40.9 |
22.7 |
61.1 |
46.1 |
22.7 |
54.3 |
MedGemma - 4B - IT |
25.6 |
81.0 |
52.4 |
29.2 |
62.9 |
27.1 |
79.0 |
47.2 |
29.3 |
46.6 |
30.8 |
103.6 |
57.0 |
46.8 |
86.7 |
LLaVA - Med - 7B |
15.0 |
43.4 |
12.8 |
18.3 |
52.9 |
18.4 |
45.5 |
38.8 |
23.5 |
44.0 |
18.8 |
68.2 |
40.9 |
16.0 |
58.1 |
HuatuoGPT - V - 7B |
23.4 |
69.5 |
48.9 |
20.0 |
48.2 |
21.3 |
64.7 |
44.2 |
19.3 |
39.4 |
29.6 |
104.3 |
52.9 |
40.7 |
63.6 |
BioMediX2 - 8B |
20.0 |
52.8 |
44.4 |
17.7 |
53.0 |
18.1 |
47.9 |
40.8 |
21.6 |
43.3 |
19.6 |
58.8 |
40.1 |
11.6 |
53.8 |
Qwen2.5VL - 7B |
24.1 |
63.7 |
47.0 |
18.4 |
55.1 |
22.2 |
62.0 |
41.0 |
17.2 |
43.1 |
26.5 |
78.1 |
48.4 |
36.3 |
66.1 |
InternVL2.5 - 8B |
23.2 |
61.8 |
47.0 |
21.0 |
56.2 |
20.6 |
58.5 |
43.1 |
19.7 |
42.7 |
24.8 |
75.4 |
51.1 |
36.7 |
67.0 |
InternVL3 - 8B |
22.9 |
66.2 |
48.2 |
21.5 |
55.1 |
20.9 |
65.4 |
44.3 |
25.2 |
43.7 |
22.9 |
76.2 |
51.2 |
31.3 |
59.9 |
Lingshu - 7B |
30.8 |
109.4 |
52.1 |
30.0 |
69.2 |
26.5 |
79.0 |
45.4 |
|
|
|
|
|
|
|
đ License
This project is licensed under the MIT License.