OpenBioner-Base Open Source Biomedical Model - Identify Unseen Entity Types Without Training

Openbioner Base

Developed by disi-unibo-nlp

OpenBioNER is a lightweight BERT model specifically designed for open-domain biomedical named entity recognition (NER). It can identify unseen entity types using only natural language descriptions of target entity types, without requiring retraining.

Sequence Labeling

PyTorch

EnglishOpen Source License:MIT #Biomedical Entity Recognition #Zero-shot Learning #Natural Language Description Driven

Downloads 210

Release Time : 4/25/2025

Model Overview

OpenBioNER is pre-trained on synthetic silver-standard data generated through self-supervision by large language models (LLMs). It outperforms specialized LLMs like UniNER and GPT-4o in zero-shot settings, achieving up to 10% higher F1 scores across multiple biomedical benchmarks.

Model Features

Zero-shot Learning Capability

Identifies unseen entity types using only natural language descriptions of target entity types, without requiring retraining.

Lightweight Design

With only 110M parameters, it reduces model size by up to 4 times compared to baseline models like GLiNER while delivering superior performance.

High Performance

Achieves up to 10% higher F1 scores across multiple biomedical benchmarks, surpassing models like GPT-4o and UniNER.

Model Capabilities

Biomedical Named Entity Recognition

Zero-shot Learning

Multi-entity Type Recognition

Use Cases

Biomedical Research

Bacterial Name Recognition

Identifies bacterial names from biomedical literature, such as Streptococcus pneumoniae.

Achieves an F1 score of 49.1% on the BC2GM dataset.

Chemical Substance Recognition

Identifies chemical substance names from chemical literature.

Achieves an F1 score of 48.0% on the BC4CHEMD dataset.

Medical Information Extraction

Disease Name Recognition

Identifies disease names from clinical texts.

Achieves an F1 score of 58.5% on the NCBI dataset.

🚀 OpenBioNER

OpenBioNER is a lightweight BERT - based model for open - domain Biomedical NER. It can find unseen target entity types via natural language descriptions without retraining and shows excellent performance in various benchmarks.

🚀 Quick Start

OpenBioNER is a specialized model designed for open - domain Biomedical NER. It can identify unseen target entity types using only their natural language descriptions, eliminating the need for retraining. Pretrained on synthetic silver annotations from LLM self - supervision, it outperforms many competing models in zero - shot settings.

✨ Features

Open - domain Adaptability: Can find unseen target entity types based on natural language descriptions without retraining.
High Performance: Outperforms specialized LLMs like UniNER and GPT - 4o, achieving up to a 10% F1 score improvement in zero - shot settings across various biomedical benchmarks.
Lightweight: Uses up to 4x fewer parameters than smaller baselines like GLiNER while achieving better performance.

📦 Installation

To use this model, you must install the IBM Zshot library (from main branch before next release):

!pip install git+https://github.com/IBM/zshot.git@main gliner --quiet
!python -m spacy download en_core_web_sm

💻 Usage Examples

Basic Usage

import spacy

from zshot import PipelineConfig, displacy
from zshot.linker import LinkerSMXM
from zshot.evaluation.metrics._seqeval._seqeval import Seqeval
from zshot.utils.data_models import Entity
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report

# define your list of candidate entity types
entities = [
     Entity(name='BACTERIUM', description='A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes. Examples include species like Streptococcus pneumoniae and Streptomyces ahygroscopicus.', vocabulary=None),
]

nlp = spacy.blank("en")
nlp_config = PipelineConfig(
    linker=LinkerSMXM(model_name="disi-unibo-nlp/openbioner-base"),
    entities=entities,
    device='cuda' # or 'cpu' if GPU not available
)
nlp.add_pipe("zshot", config=nlp_config, last=True)


sentence = "Impact of cofactor - binding loop mutations on thermotolerance and activity of E. coli transketolase"
doc = nlp(sentence)

displacy.render(doc, style="ent")

📚 Documentation

Performance

OpenBioNER outperforms all competing models, achieving the highest average performance across all datasets.

Model	Size	AnatEM	NCBI	JNLPBA	BC2GM	BC4CHEMD	BC5CDR	JNLPBA - R	MedMentions - R	AVG
GPT - 4o	-	38.7	50.0	41.9	37.3	36.4	66.4	26.6	49.1	43.3
UniNER	7B	25.1	60.4	48.1	46.2	47.9	68.0	50.2	53.4	49.9
GLiNER_large - v1	459M	33.3	61.9	57.1	47.9	43.1	66.4	51.9	53.4	51.9
OpenBioNER (Ours)	110M	35.2	58.5	57.1	49.1	48.0	60.4	63.9	50.9	52.9

⚠️ Important Note

Please note that running evaluations using the zshot library may lead to slightly different results on certain benchmarks compared to those reported in the paper (above). This discrepancy is due to differences in token alignment: zshot uses spaCy's character - based span matching, while our experiments use token - level alignment as handled by BERT - based NER pipelines. These differences can affect how entity spans are matched and evaluated, particularly in cases with subword tokenization or punctuation.

🧬 How to Write Effective Entity Type Descriptions

Entity type descriptions are crucial for improving generalization in OpenBioNER. Well - written descriptions help models disambiguate types, handle rare classes, and align with real - world usage across diverse datasets.

✅ Best Practices

Start with a clear definition: Briefly explain what the entity type is.
Include functions or context: Add what it does, its purpose, or where it appears.
List 3–5 concrete examples: Use domain - relevant examples (e.g., real diseases, proteins, or food items).
Mention subtypes or synonyms (optional): Helps capture lexical variation and rare mentions.
Keep it concise: 1–3 well - structured sentences are ideal.

⚠️ Common Mistakes to Avoid

Vague or overly generic descriptions
No examples
Just a list of terms
Redundant or circular wording

🧪 Template (Recommended Format)

A [TYPE] refers to [concise definition]. It includes examples such as [example1], [example2], and [example3].

📄 License

This project is licensed under the MIT license.

👥 Authors

Alessio Cocchieri
[Giacomo Frisoni](https://huggingface.co/giacomo - frisoni)
Marcos Martinez Galindo
Gianluca Moro
Giuseppe Tagliavini
Francesco Candoli

📬 Contacts

For questions, collaborations, or feedback, feel free to reach out:

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご