๐ OpenBioNER
OpenBioNER is a lightweight BERT - based model for open - domain Biomedical NER. It can find unseen target entity types via natural language descriptions without retraining and shows excellent performance in various benchmarks.
๐ Quick Start
OpenBioNER is a specialized model designed for open - domain Biomedical NER. It can identify unseen target entity types using only their natural language descriptions, eliminating the need for retraining. Pretrained on synthetic silver annotations from LLM self - supervision, it outperforms many competing models in zero - shot settings.
โจ Features
- Open - domain Adaptability: Can find unseen target entity types based on natural language descriptions without retraining.
- High Performance: Outperforms specialized LLMs like UniNER and GPT - 4o, achieving up to a 10% F1 score improvement in zero - shot settings across various biomedical benchmarks.
- Lightweight: Uses up to 4x fewer parameters than smaller baselines like GLiNER while achieving better performance.
๐ฆ Installation
To use this model, you must install the IBM Zshot library (from main branch before next release):
!pip install git+https://github.com/IBM/zshot.git@main gliner --quiet
!python -m spacy download en_core_web_sm
๐ป Usage Examples
Basic Usage
import spacy
from zshot import PipelineConfig, displacy
from zshot.linker import LinkerSMXM
from zshot.evaluation.metrics._seqeval._seqeval import Seqeval
from zshot.utils.data_models import Entity
from zshot.evaluation.zshot_evaluate import evaluate, prettify_evaluate_report
entities = [
Entity(name='BACTERIUM', description='A bacterium refers to a type of microorganism that can exist as a single cell and may cause infections or play a role in various biological processes. Examples include species like Streptococcus pneumoniae and Streptomyces ahygroscopicus.', vocabulary=None),
]
nlp = spacy.blank("en")
nlp_config = PipelineConfig(
linker=LinkerSMXM(model_name="disi-unibo-nlp/openbioner-base"),
entities=entities,
device='cuda'
)
nlp.add_pipe("zshot", config=nlp_config, last=True)
sentence = "Impact of cofactor - binding loop mutations on thermotolerance and activity of E. coli transketolase"
doc = nlp(sentence)
displacy.render(doc, style="ent")
๐ Documentation
Links
Performance
OpenBioNER outperforms all competing models, achieving the highest average performance across all datasets.
Model |
Size |
AnatEM |
NCBI |
JNLPBA |
BC2GM |
BC4CHEMD |
BC5CDR |
JNLPBA - R |
MedMentions - R |
AVG |
GPT - 4o |
- |
38.7 |
50.0 |
41.9 |
37.3 |
36.4 |
66.4 |
26.6 |
49.1 |
43.3 |
UniNER |
7B |
25.1 |
60.4 |
48.1 |
46.2 |
47.9 |
68.0 |
50.2 |
53.4 |
49.9 |
GLiNER_large - v1 |
459M |
33.3 |
61.9 |
57.1 |
47.9 |
43.1 |
66.4 |
51.9 |
53.4 |
51.9 |
OpenBioNER (Ours) |
110M |
35.2 |
58.5 |
57.1 |
49.1 |
48.0 |
60.4 |
63.9 |
50.9 |
52.9 |
โ ๏ธ Important Note
Please note that running evaluations using the zshot
library may lead to slightly different results on certain benchmarks compared to those reported in the paper (above). This discrepancy is due to differences in token alignment: zshot
uses spaCy's character - based span matching, while our experiments use token - level alignment as handled by BERT - based NER pipelines. These differences can affect how entity spans are matched and evaluated, particularly in cases with subword tokenization or punctuation.
๐งฌ How to Write Effective Entity Type Descriptions
Entity type descriptions are crucial for improving generalization in OpenBioNER. Well - written descriptions help models disambiguate types, handle rare classes, and align with real - world usage across diverse datasets.
โ
Best Practices
- Start with a clear definition: Briefly explain what the entity type is.
- Include functions or context: Add what it does, its purpose, or where it appears.
- List 3โ5 concrete examples: Use domain - relevant examples (e.g., real diseases, proteins, or food items).
- Mention subtypes or synonyms (optional): Helps capture lexical variation and rare mentions.
- Keep it concise: 1โ3 well - structured sentences are ideal.
โ ๏ธ Common Mistakes to Avoid
- Vague or overly generic descriptions
- No examples
- Just a list of terms
- Redundant or circular wording
๐งช Template (Recommended Format)
A [TYPE] refers to [concise definition]. It includes examples such as [example1], [example2], and [example3].
๐ License
This project is licensed under the MIT license.
๐ฅ Authors
๐ฌ Contacts
For questions, collaborations, or feedback, feel free to reach out: