🚀 GPT-SW3 - A Collection of Large Language Models
GPT-SW3 is a collection of large decoder-only pretrained transformer language models. It can generate coherent text in multiple languages and programming languages, and can be instructed to perform various text tasks.
🚀 Quick Start
To access the model from Python, since this is a private repository, you need to log in with your access token using huggingface-cli login
. Refer to the HuggingFace Quick Start Guide for more details.
The following code snippet demonstrates how to load the tokenizer and model, and use the GPU if available:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
model_name = "AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)
Generating text using the generate
method:
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)
generated_token_ids = model.generate(
inputs=input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.6,
top_p=1,
)[0]
generated_text = tokenizer.decode(generated_token_ids)
Using the HuggingFace pipeline is a convenient alternative:
generator = pipeline('text-generation', tokenizer=tokenizer, model=model, device=device)
generated = generator(prompt, max_new_tokens=100, do_sample=True, temperature=0.6, top_p=1)[0]["generated_text"]
✨ Features
- Multilingual Capability: GPT-SW3 can generate coherent text in 5 different languages (Swedish, Norwegian, Danish, Icelandic, English) and 4 programming languages.
- Instruction Following: It can be instructed to perform text tasks that it has not been explicitly trained for by casting them as text generation tasks.
📚 Documentation
Model Description
GPT-SW3 is a collection of large decoder-only pretrained transformer language models developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. It has been trained on a dataset containing 320B tokens in multiple languages and programming code, using the NeMo Megatron GPT implementation with a causal language modeling (CLM) objective. The instruct
models were finetrained on instruction data in both chat and raw text formats.
Intended Use
GPT-SW3 is an autoregressive large language model capable of generating text in multiple languages and programming languages. It can be used for text generation tasks and can be instructed to perform other text-related tasks.
Limitations
Like other large language models, GPT-SW3 has limitations. It may have issues with bias, safety, generation diversity, and hallucination. The model may overrepresent some viewpoints, contain stereotypes, generate inappropriate language, make errors, produce incorrect information, generate irrelevant or repetitive outputs, and create content that may not be suitable for all settings.
Compliance
The release of GPT-SW3 includes model weights, a configuration file, a tokenizer file, and a vocabulary file. None of these files contain personally identifiable information (PII) or copyrighted material.
Model Details
Property |
Details |
Developer |
AI Sweden in collaboration with RISE and the WASP WARA for Media and Language |
Release Date |
2022-12-20 |
Model Version |
Second generation of GPT-SW3 |
Model Type |
Large decoder-only transformer language model |
Training Algorithm |
Trained with the NeMo Megatron GPT implementation |
License |
LICENSE |
Contact |
nlu@ai.se |
Intended Use Details
- Primary Uses: Pre-release for research and evaluation of large language models for Nordic languages.
- Intended Users: Organizations and individuals in the Nordic NLP ecosystem who can contribute to model validation and testing and provide feedback.
- Out-of-scope Use Cases: See the modified RAIL license.
Data, Limitations, and Recommendations
- Training Data Selection: Based on a combination of breadth and availability. See the datasheet for more details.
- Limitations: Similar to other large language models, it has issues with bias, safety, generation diversity, and hallucination.
- Recommendations: Indirect users should be aware of LLM-generated content. Users should be aware of risks and limitations and include appropriate disclaimers or blocking interfaces. Models pretrained with the LLM should have an updated model card. Users should provide feedback mechanisms.
Datasheet
- Motivation: To train Swedish large language models, a high-quality large-scale Swedish dataset was needed. Since no such dataset existed, data in Nordic and English languages were collected.
- Creator: The NLU research group at AI Sweden, which consists of researchers and developers from AI Sweden and RISE.
- Funding: Funded by the Swedish Innovation Agency (Vinnova) through several grants, including 2019-02996 and 2022-00949.
Composition
The dataset is a filtered and deduplicated collection of textual documents categorized by language and document type, including sources from books, articles, code, conversational data, math, miscellaneous sources, and web data.
📄 License
The model is released under the LICENSE.
⚠️ Important Note
GPT-SW3 has limitations in terms of bias, safety, generation diversity, and hallucination. The model may generate inappropriate or incorrect content.
💡 Usage Tip
Indirect users should be made aware when the content they're working with is created by the LLM. Users should be aware of Risks and Limitations, and include an appropriate age disclaimer or blocking interface as necessary.