GPT-SW3-6.7B-V2 Open-Source Large Model - Free Deployment, Supports Text Generation in 6 Languages

Gpt Sw3 6.7b V2 Instruct 4bit Gptq

Developed by AI-Sweden-Models

GPT-SW3 is a large Nordic language model developed by AI Sweden, supporting text generation tasks in 5 Nordic languages and English

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Nordic multilingual generation #Instruction fine-tuning optimization #Large model pre-training

Downloads 50

Release Time : 11/30/2023

Model Overview

GPT-SW3 is an autoregressive large language model capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, English, and programming languages. This instruct version is fine-tuned with instruction data, making it more suitable for dialogue and instruction response scenarios.

Model Features

Nordic multilingual support

Specially optimized for Swedish, Norwegian, Danish, and Icelandic, while also supporting English

Instruction fine-tuning

Fine-tuned on instruction data, making it more suitable for dialogue and task completion scenarios

Large-scale pre-training

Trained on a 320 billion token multilingual dataset covering various text types

Model Capabilities

Multilingual text generation

Dialogue response

Instruction execution

Code generation

Knowledge Q&A

Use Cases

Dialogue systems

Nordic language customer service bot

Providing multilingual customer support for businesses in Nordic regions

Content generation

Nordic language content creation

Generating marketing copy or articles in Swedish, Norwegian, etc.

🚀 GPT-SW3 Model

GPT-SW3 is a collection of large decoder-only pretrained transformer language models. It can generate coherent text in multiple languages and programming languages, and can be instructed to perform various text tasks.

🚀 Quick Start

Since this is a private repository, you need to log in with your access token to access the model from Python. You can do this with huggingface-cli login. See HuggingFace Quick Start Guide for more information.

The following code snippet loads the tokenizer & model, and uses the GPU if available:

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Initialize Variables
model_name = "AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"

# Initialize Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)

✨ Features

Multilingual Generation: Capable of generating coherent text in 5 different languages and 4 programming languages.
Instruction-based Tasks: Can be instructed to perform text tasks it hasn't been explicitly trained for by casting them as text generation tasks.

📦 Installation

Not provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

# Initialize Variables
model_name = "AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
prompt = "Träd är fina för att"

# Initialize Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
model.to(device)

input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)

generated_token_ids = model.generate(
    inputs=input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.6,
    top_p=1,
)[0]

generated_text = tokenizer.decode(generated_token_ids)

Advanced Usage

Generating text using the generate method in the chat format:

prompt = """
<|endoftext|><s>
User:
Varför är träd fina?
<s>
Bot:
""".strip()

input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].to(device)

generated_token_ids = model.generate(
    inputs=input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.6,
    top_p=1,
)[0]

generated_text = tokenizer.decode(generated_token_ids)

Using the HuggingFace pipeline:

generator = pipeline('text-generation', tokenizer=tokenizer, model=model, device=device)
generated = generator(prompt, max_new_tokens=100, do_sample=True, temperature=0.6, top_p=1)[0]["generated_text"]

📚 Documentation

Intended Use

GPT-SW3 is pre-released for research and evaluation of the capabilities of Large Language Models for the Nordic languages. It aims to contribute to knowledge building for LLMs, validate the model, and collect feedback.

Limitations

Like other large language models, GPT-SW3 has limitations in terms of bias, safety, generation diversity, and hallucination. It may overrepresent some viewpoints, contain stereotypes, generate inappropriate language, make errors, and produce irrelevant or repetitive outputs.

Model Details

Property	Details
Person or organization developing model	GPT-SW3 was developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language.
Model date	GPT-SW3 date of release 2022 - 12 - 20
Model version	This is the second generation of GPT-SW3.
Model type	GPT-SW3 is a large decoder-only transformer language model.
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features	GPT-SW3 was trained with the NeMo Megatron GPT implementation.
Paper or other resource for more information	N/A
License	LICENSE
Where to send questions or comments about the model	nlu@ai.se

Intended Use

Primary intended uses: Research and evaluation of Large Language Models for the Nordic languages.
Primary intended users: Organizations and individuals in the Nordic NLP ecosystem.
Out-of-scope use cases: See the modified RAIL license.

Data, Limitations, and Recommendations

Data selection for training: Training data was selected based on breadth and availability. See the Datasheet for more details.
Data selection for evaluation: N/A
Limitations: Similar to other large language models, GPT-SW3 has limitations in bias, safety, generation diversity, and hallucination.
Recommendations for future work: Indirect users should be aware of LLM-generated content. Users should be aware of risks and limitations and include appropriate disclaimers. Models pretrained with the LLM should have an updated Model Card. Users should provide feedback mechanisms.

GPT-SW3 Datasheet

Motivation

The dataset was created for pre-training Swedish Large Language Models due to the lack of large-scale high-quality Swedish datasets.
The dataset was created by the NLU research group at AI Sweden, which consists of researchers and developers from AI Sweden and RISE.
The Swedish Innovation Agency (Vinnova) funded the work through several grants, including 2019 - 02996 and 2022 - 00949.

Composition

The dataset consists of textual documents categorized by language and document type. It includes sources such as books, articles, code, conversational data, math data, miscellaneous data, and web common crawl data.

🔧 Technical Details

GPT-SW3 was trained with the NeMo Megatron GPT implementation on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The instruct models were finetrained on instruction data using both chat and raw text formats.

📄 License

The model is released under the LICENSE.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご