GPT-SW3 Open-Source Nordic Language Model - Free Support for Text Generation Tasks in 6 Languages

Gpt Sw3 20b Instruct 4bit Gptq

Developed by AI-Sweden-Models

GPT-SW3 is a large-scale Nordic language model developed by AI Sweden, supporting text generation tasks in 5 Nordic languages and English.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Nordic multilingual generation #Instruction fine-tuned model #320B token pre-training

Downloads 60

Release Time : 12/6/2023

Model Overview

GPT-SW3 is an autoregressive large language model capable of generating coherent text in 5 different languages and 4 programming languages. The model has been fine-tuned with instruction data, making it suitable for dialogue and instruction-following tasks.

Model Features

Nordic Language Support

Specifically optimized for Nordic languages such as Swedish, Norwegian, Danish, and Icelandic

Instruction Fine-tuning

Fine-tuned on multiple instruction datasets to enhance the model's ability to follow instructions

Large-scale Pre-training

Pre-trained on a multilingual dataset containing 320B tokens, covering various text types

Model Capabilities

Multilingual text generation

Dialogue system construction

Instruction following

Code generation

Knowledge Q&A

Use Cases

Dialogue Systems

Nordic Language Chatbot

Build intelligent dialogue systems supporting multiple Nordic languages

Capable of generating natural and fluent Nordic language conversations

Content Generation

Multilingual Content Creation

Automatically generate articles, reports, and other content in Nordic languages

Produces coherent and contextually appropriate text

🚀 GPT-SW3: A Multilingual Large Language Model

GPT-SW3 is a collection of large decoder-only pretrained transformer language models capable of generating coherent text in multiple languages and programming languages. It offers various model versions and can be instructed to perform diverse text tasks.

🚀 Quick Start

To access the model from Python, since this is a private repository, you need to log in with your access token using huggingface-cli login. Refer to the HuggingFace Quick Start Guide for more details.

✨ Features

Multilingual Support: Capable of generating text in Danish, Swedish, English, Norwegian, and Icelandic, as well as 4 programming languages.
Instruction Following: Can be instructed to perform text tasks not explicitly trained for by casting them as text generation tasks.
Multiple Model Versions: Offers base models, instruct models, and quantized models with different scales.

📦 Installation

As this is a private repository, you need to log in with your access token to access the model from Python. Use the following command:

huggingface-cli login

💻 Usage Examples

Basic Usage

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Initialize Variables
model_name = "AI-Sweden-Models/gpt-sw3-20b-instruct"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"

# Initialize Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)

input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)

generated_token_ids = model.generate(
    inputs=input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.6,
    top_p=1,
)[0]

generated_text = tokenizer.decode(generated_token_ids)
print(generated_text)

Using HuggingFace Pipeline

generator = pipeline('text-generation', tokenizer=tokenizer, model=model, device=device)
generated = generator(prompt, max_new_tokens=100, do_sample=True, temperature=0.6, top_p=1)[0]["generated_text"]
print(generated)

📚 Documentation

Model Description

GPT-SW3 is a collection of large decoder-only pretrained transformer language models developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. It has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code, using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation. The instruct models were finetrained on instruction data in both chat and raw text formats.

Intended Use

GPT-SW3 is an autoregressive large language model capable of generating coherent text in 5 different languages and 4 programming languages. It can also be instructed to perform text tasks it has not been explicitly trained for by casting them as text generation tasks.

Limitations

Like other large language models, GPT-SW3 has limitations in terms of bias, safety, generation diversity, and hallucination. The model may overrepresent some viewpoints, contain stereotypes, generate inappropriate language, make errors, and produce irrelevant or repetitive outputs.

Compliance

The release of GPT-SW3 consists of model weights, a configuration file, a tokenizer file, and a vocabulary file, none of which contain any personally identifiable information (PII) or copyrighted material.

Model Card

We provide a model card for GPT-SW3 following Mitchell et al. (2018).

Model Details

Property	Details
Developer	AI Sweden in collaboration with RISE and the WASP WARA for Media and Language
Release Date	2022-12-20
Model Version	Second generation of GPT-SW3
Model Type	Large decoder-only transformer language model
Training Algorithm	Trained with the NeMo Megatron GPT implementation
Paper or Resource	N/A
License	LICENSE
Contact	nlu@ai.se

Intended Use

Primary Uses: Pre-release for research and evaluation of large language model capabilities for Nordic languages.
Primary Users: Organizations and individuals in the Nordic NLP ecosystem who can contribute to model validation and testing.
Out-of-Scope Use Cases: See the modified RAIL license.

Data, Limitations, and Recommendations

Data Selection for Training: Training data was selected based on breadth and availability. See the Datasheet for more details.
Limitations: Similar to other large language models, GPT-SW3 has limitations in bias, safety, generation diversity, and hallucination.
Recommendations for Future Work: Indirect users should be aware of LLM-generated content. Users should be aware of risks and limitations and include appropriate disclaimers. Models pretrained with the LLM should include an updated Model Card, and users should provide feedback mechanisms.

Datasheet

We follow the recommendations of Gebru et al. (2021) and provide a datasheet for the dataset used to train GPT-SW3.

Motivation

Purpose: To train Swedish large language models, a large-scale Swedish dataset of high quality was needed. Since no such dataset existed, data in Nordic and English languages was collected.
Creator: The NLU research group at AI Sweden, consisting of researchers and developers from AI Sweden and RISE.
Funding: Funded by the Swedish Innovation Agency (Vinnova) through grants such as 2019-02996 and 2022-00949.

Composition

The dataset consists of textual documents categorized by language and document type, including sources from books, articles, code, conversational data, math, miscellaneous sources, web common crawl, and web sources.

📄 License

The model is released under the modified RAIL license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご