Gliner-poly-small-v1.0 Open-source Named Entity Recognition Model

Gliner Poly Small V1.0

Developed by knowledgator

GLiNER is a flexible Named Entity Recognition (NER) model capable of identifying any entity type, providing a practical alternative to traditional NER models and large language models.

Sequence Labeling

PyTorch

OtherOpen Source License:Apache-2.0 #Multi-type Entity Recognition #Zero-shot Learning #Dual-encoder Architecture

Downloads 18

Release Time : 8/19/2024

Model Overview

GLiNER is a Named Entity Recognition (NER) model that identifies any entity type through a bidirectional Transformer encoder. Compared to traditional NER models limited to predefined entities, GLiNER is more flexible while being lighter and more cost-effective than large language models (LLMs).

Model Features

Flexible Recognition of Any Entity Type

GLiNER can identify any entity type, not limited to predefined entity sets.

Dual-encoder Architecture

Utilizes a post-fusion dual-encoder architecture with a text encoder (DeBERTa v3 small) and an entity label encoder (BGE-small-en), enhancing the understanding of relationships between labels.

Efficient Inference

Faster inference if entity embeddings are preprocessed; capable of identifying an unlimited number of entities at once.

Strong Generalization

Better generalization for unseen entities, suitable for handling diverse entity types.

Model Capabilities

Named Entity Recognition

Multilingual Support

Efficient Inference

Flexible Entity Recognition

Use Cases

Information Extraction

Person and Event Extraction

Extract entities such as people, dates, awards, events, and teams from text.

As shown in the example, the model accurately identifies entities like Cristiano Ronaldo, February 5, 1985, and the Ballon d'Or.

Social Media Analysis

Tweet Entity Recognition

Extract entities from social media tweets for trend analysis or content categorization.

In benchmark tests, the model achieved a score of 70.2% on the Broad Tweet Corpus dataset.

Biomedical Text Analysis

Medical Entity Recognition

Extract entities like diseases and drugs from biomedical literature.

Achieved a score of 60.5% on the bc5cdr dataset.

🚀 GLiNER

GLiNER is a Named Entity Recognition (NER) model. It can identify any entity type using bidirectional transformer encoders (BERT-like). It offers a practical alternative to traditional NER models and Large Language Models (LLMs), providing a cost - effective solution for resource - constrained scenarios.

🚀 Quick Start

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using bidirectional transformer encoders (BERT - like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource - constrained scenarios.

This particular version utilizes a bi - encoder architecture with post - fusion. The textual encoder is [DeBERTa v3 small](microsoft/deberta - v3 - small), and the entity label encoder is a sentence transformer - [BGE - small - en](https://huggingface.co/BAAI/bge - small - en - v1.5).

Such an architecture brings several advantages over the uni - encoder GLiNER:

An unlimited amount of entities can be recognized at a single time;
Faster inference if entity embeddings are preprocessed;
Better generalization to unseen entities.

The post - fusion strategy brings advantages over the classical bi - encoder, enabling better inter - label understanding.

✨ Features

Flexible Entity Recognition: Capable of identifying any entity type, not limited to predefined ones.
Cost - Effective: A practical alternative to LLMs for resource - constrained scenarios.
Bi - encoder Architecture: With post - fusion, offering multiple advantages such as unlimited entity recognition at once, faster inference with preprocessed embeddings, and better generalization.

📦 Installation

Install or update the gliner package:

pip install gliner -U

💻 Usage Examples

Basic Usage

Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using GLiNER.from_pretrained and predict entities with predict_entities.

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner - poly - small - v1.0")

text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 official senior career goals for club and country, making him the top goalscorer of all time.
"""

labels = ["person", "award", "date", "competitions", "teams"]

entities = model.predict_entities(text, labels, threshold = 0.25)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
UEFA Champions Leagues => competitions
UEFA European Championship => competitions
UEFA Nations League => competitions
Champions League => competitions
European Championship => competitions

Advanced Usage

If you have a large amount of entities and want to pre - embed them, please refer to the following code snippet:

labels = ["your entities"]
texts = ["your texts"]

entity_embeddings = model.encode_labels(labels, batch_size = 8)

outputs = model.batch_predict_with_embeds([text], entity_embeddings, labels)

📚 Documentation

Below you can see the table with benchmarking results on various named entity recognition datasets:

Property	Details
Model Type	Named Entity Recognition (NER) model using bidirectional transformer encoders
Training Data	urchade/pile - mistral - v0.1, numind/NuNER, knowledgator/GLINER - multi - task - synthetic - data
Pipeline Tag	token - classification

Dataset	Score
ACE 2004	25.4%
ACE 2005	27.2%
AnatEM	17.7%
Broad Tweet Corpus	70.2%
CoNLL 2003	67.8%
FabNER	22.9%
FindVehicle	40.2%
GENIA_NER	47.7%
HarveyNER	15.5%
MultiNERD	64.5%
Ontonotes	28.7%
PolyglotNER	47.5%
TweetNER7	39.3%
WikiANN en	56.7%
WikiNeural	80.0%
bc2gm	56.2%
bc4chemd	48.7%
bc5cdr	60.5%
ncbi	53.5%
Average	45.8%

CrossNER_AI	48.9%
CrossNER_literature	64.0%
CrossNER_music	68.7%
CrossNER_politics	69.0%
CrossNER_science	62.7%
mit - movie	40.3%
mit - restaurant	36.2%
Average (zero - shot benchmark)	55.7%

📄 License

This project is licensed under the Apache - 2.0 license.

💡 Usage Tip

Connect with our community on Discord for news, support, and discussion about our models. Join Discord.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご