Norbert3-base: An Open-Source Norwegian Language Model - Supports Text Processing of Two Norwegian Languages

Norbert3 Base

Developed by ltg

NorBERT 3 is a next-generation Norwegian language model based on the BERT architecture, supporting both Bokmål and Nynorsk written Norwegian.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Norwegian BERT variant #Masked language modeling #Multi-dialect support

Downloads 345

Release Time : 3/2/2023

Model Overview

NorBERT 3 is a Norwegian language model based on the BERT architecture, primarily used for natural language processing tasks such as text classification and named entity recognition.

Model Features

Multilingual support

Supports both Bokmål and Nynorsk variants of Norwegian.

Multiple size versions

Offers various parameter sizes from ultra-small to large versions to accommodate different computational resource needs.

Custom wrapper

Requires loading a custom wrapper from `modeling_norbert.py` to support various natural language processing tasks.

Model Capabilities

Text classification

Named entity recognition

Question answering

Text generation

Language understanding

Use Cases

Natural language processing

Text classification

Used for classification tasks in Norwegian text, such as sentiment analysis and topic classification.

Named entity recognition

Identifies named entities in Norwegian text, such as person names, place names, and organization names.

Question answering

Builds Norwegian question answering systems to respond to user queries.

🚀 NorBERT 3 base

The official release of a new - generation Norwegian language model described in the paper NorBench — A Benchmark for Norwegian Language Models.

🚀 Quick Start

The NorBERT 3 base is a new - generation Norwegian language model. To learn more details about the model, please read the paper NorBench — A Benchmark for Norwegian Language Models.

✨ Features

Multilingual Support: Supports languages like 'no', 'nb', and 'nn'.
Multiple Model Sizes: Offers various model sizes to meet different requirements.
Generative Siblings: Has corresponding generative NorT5 models.

📦 Installation

This model currently needs a custom wrapper from modeling_norbert.py, and you should load the model with trust_remote_code=True.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("ltg/norbert3-base")
model = AutoModelForMaskedLM.from_pretrained("ltg/norbert3-base", trust_remote_code=True)

mask_id = tokenizer.convert_tokens_to_ids("[MASK]")
input_text = tokenizer("Nå ønsker de seg en[MASK] bolig.", return_tensors="pt")
output_p = model(**input_text)
output_text = torch.where(input_text.input_ids == mask_id, output_p.logits.argmax(-1), input_text.input_ids)

# should output: '[CLS] Nå ønsker de seg en ny bolig.[SEP]'
print(tokenizer.decode(output_text[0].tolist()))

Advanced Usage

The following classes are currently implemented: AutoModel, AutoModelMaskedLM, AutoModelForSequenceClassification, AutoModelForTokenClassification, AutoModelForQuestionAnswering and AutoModeltForMultipleChoice.

📚 Documentation

Other Sizes

Generative NorT5 Siblings

📄 License

This project is licensed under the apache - 2.0 license.

Cite us

@inproceedings{samuel-etal-2023-norbench,
    title = "{N}or{B}ench {--} A Benchmark for {N}orwegian Language Models",
    author = "Samuel, David  and
      Kutuzov, Andrey  and
      Touileb, Samia  and
      Velldal, Erik  and
      {\O}vrelid, Lilja  and
      R{\o}nningstad, Egil  and
      Sigdel, Elina  and
      Palatkina, Anna",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.61",
    pages = "618--633",
    abstract = "We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based). Finally, we compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.",
}

Property	Details
Supported Languages	'no', 'nb', 'nn'
Inference	false
Tags	BERT, NorBERT, Norwegian, encoder
License	apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご