🚀 NER4Legal_SRB
NER4Legal_SRB is a fine - tuned model for Named Entity Recognition (NER) in Serbian legal documents, leveraging a pre - trained BERT model to automate legal document processing tasks.
🚀 Quick Start
The NER4Legal_SRB model can be run on both CPU and GPU. You can use the following Python code to perform Named Entity Recognition:
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("kalusev/NER4Legal_SRB", use_auth_token=True)
model = AutoModelForTokenClassification.from_pretrained("kalusev/NER4Legal_SRB", use_auth_token=True).to(device)
id_to_label = {
0: 'O',
1: 'B-COURT',
2: 'B-DATE',
3: 'B-DECISION',
4: 'B-LAW',
5: 'B-MONEY',
6: 'B-OFFICIAL GAZZETE',
7: 'B-PERSON',
8: 'B-REFERENCE',
9: 'I-COURT',
10: 'I-LAW',
11: 'I-MONEY',
12: 'I-OFFICIAL GAZZETE',
13: 'I-PERSON',
14: 'I-REFERENCE'
}
def perform_ner(text):
"""
Perform Named Entity Recognition on a single text with GPU memory fallback.
Args:
text (str): Input text.
Returns:
list: List of tokens and predicted labels.
"""
try:
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=2).squeeze().tolist()
except RuntimeError as e:
if "CUDA out of memory" in str(e):
print("Switching to CPU due to memory constraints.")
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model.cpu()(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=2).squeeze().tolist()
else:
raise e
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"].squeeze())
labels = [id_to_label[pred] for pred in predictions]
results = [
(token, label)
for token, label in zip(tokens, labels)
if token not in tokenizer.all_special_tokens
]
return results
text = """Rešenjem Apelacionog suda u Novom Sadu, Gž1. 1901/10 od 12.05.2010. godine žalba tuženog je usvojena, a presuda Opštinskog suda u Novom Sadu, P. 5734/04 od 29.01.2009. godine, ukinuta i predmet upućen ovom sudu na ponovno suđenje."""
results = perform_ner(text)
print("Token | Predicted Label")
print("----------------------------------------")
for token, label in results:
print(f"{token:<17} | {label}")
✨ Features
- Legal Document Processing: Designed specifically for Serbian legal documents, including public court rulings, to automate tasks such as document archiving, search, and retrieval.
- High Performance: Achieved a mean F1 score of 0.96 during cross - validation tests on the labeled dataset, demonstrating robustness and applicability to real - world scenarios.
- CPU and GPU Support: Can be run on both CPU and GPU, providing flexibility for different computing environments.
📦 Installation
The model can be installed using the transformers
library. You can install it via pip:
pip install transformers
📚 Documentation
Model Description
NER4Legal_SRB is a fine - tuned Named Entity Recognition (NER) model for Serbian legal documents. It is based on the pre - trained [classla/bcms - bertic](https://huggingface.co/classla/bcms - bertic) BERT model. The model was developed as part of the conference paper "Named Entity Recognition for Serbian Legal Documents: Design, Methodology and Dataset Development", which will be published at the 15th International Conference on Information Society and Technology in 2025.
Abstract
Advancements in NLP and LLMs have led to research on document processing tools. This work presents an LLM - based NER solution for Serbian legal documents. It uses a pre - trained BERT model, develops a novel dataset, and discusses performance metrics. Cross - validation tests with a mean F1 score of 0.96 confirm the solution's applicability and robustness.
Base Model
The model is fine - tuned from the [classla/bcms - bertic](https://huggingface.co/classla/bcms - bertic) pre - trained BERT model, which is designed for BCMS (Bosnian, Croatian, Montenegrin, Serbian) languages.
Dataset
The model was fine - tuned on a manually labeled dataset of Serbian legal documents, including public court rulings. This dataset enables precise entity identification and classification in Serbian legal texts.
Performance Metrics
The model achieved a mean F1 score of 0.96 during cross - validation tests on the labeled dataset. For detailed evaluation information, please refer to the original conference paper.
🔧 Technical Details
The model leverages the pre - trained BERT architecture, which is well - known for its ability to capture semantic information from text. The pre - trained [classla/bcms - bertic](https://huggingface.co/classla/bcms - bertic) model was carefully adapted to the specific task of identifying and classifying entities in Serbian legal texts. The model was trained on a manually labeled dataset, which was specifically developed for this task.
📄 License
This model is released under the Apache - 2.0 license.
If you would like to use this software, please consider citing the following publication:
- *Kalušev, V., Brkljač, B. (2025). Named entity recognition for Serbian legal documents: Design, methodology and dataset development. In Proceedings of the 15th International Conference on Information Society and Technology (ICIST), Kopaonik, Serbia, 9 - 12 March, 2025, Vol. -, ISBN -, accepted for publication
@inproceedings{KalusevNER2025,
author = {Kalu{\v{s}ev, Vladimir and Brklja{\v{c}}, Branko},
booktitle = {15th International Conference on Information Society and Technology (ICIST)},
doi = {-},
month = mar,
pages = {1--16},
title = {Named entity recognition for Serbian legal documents: {D}esign, methodology and dataset development},
year = {2025}
}
@misc{kalušev2025namedentityrecognitionserbian,
title={Named entity recognition for Serbian legal documents: Design, methodology and dataset development},
author={Vladimir Kalušev and Branko Brkljač},
year={2025},
eprint={2502.10582},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.10582},
}
Contributors

⚠️ Important Note
For detailed information about model evaluation and reported results, please consult the original conference paper.
💡 Usage Tip
If you encounter a "CUDA out of memory" error, the code will automatically switch to CPU mode. However, running on CPU may be slower.