Swe-gpt-wiki Open-source Model - Freely and Easily Get Swedish Wikipedia-style Content Expressions

Swe Gpt Wiki

Developed by flax-community

This is a Swedish GPT2-style model trained using the Flax CLM process, with training data from the Swedish portion of the wiki40b dataset.

Large Language Model Other#Swedish text generation #Wikipedia training #Multi-turn dialogue

Downloads 24

Release Time : 3/2/2022

Model Overview

This model is a Swedish GPT2-style language model primarily designed for Swedish text generation tasks.

Model Features

Trained on Wikipedia data

Trained using the Swedish portion of the wiki40b dataset, providing broad knowledge coverage

Flax/Jax framework training

Efficient training on TPUs using Flax Jax

Swedish language optimization

Specially optimized and trained for Swedish

Model Capabilities

Swedish text generation

Language modeling

Text continuation

Use Cases

Content creation

Swedish article generation

Generate Wikipedia-style articles in Swedish

Education

Swedish learning assistance

Generate Swedish learning materials and examples

🚀 GPT2-svenska-wikipedia

This is a Swedish GPT2-style model. It was trained using the Flax CLM pipeline on the Swedish part of the wiki40b dataset.

🚀 Quick Start

This Swedish GPT2-style model offers a powerful solution for Swedish language processing tasks. It's trained on the Swedish part of the wiki40b dataset, which can be found at wiki40b dataset.

✨ Features

Model Series: This model is part of a series of models trained on TPU with Flax Jax during the Huggingface Flax/Jax challenge.
Multiple Model Variants: There are various related models including different GPT and Roberta models for different Nordic languages.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

The following Python script shows how to load and clean the dataset used for training the model.

from datasets import load_dataset
def load_and_clean_wiki():
    dataset = load_dataset('wiki40b', 'sv', beam_runner='DirectRunner', split="train")
    #dataset = load_dataset('wiki40b', 'sv', beam_runner='DirectRunner')
    dataset = dataset.remove_columns(['wikidata_id', 'version_id'])
    filtered_dataset = dataset.map(filter_wikipedia)
    # filtered_dataset[:3]
    # print(filtered_dataset[:3])
    return filtered_dataset

def filter_wikipedia(batch):
    batch["text"] = " ".join(batch["text"].split("\
_START_SECTION_\
"))
    batch["text"] = " ".join(batch["text"].split("\
_START_ARTICLE_\
"))
    batch["text"] = " ".join(batch["text"].split("\
_START_ARTICLE_\
"))
    batch["text"] = " ".join(batch["text"].split("\
_START_PARAGRAPH_\
"))
    batch["text"] = " ".join(batch["text"].split("_NEWLINE_"))
    batch["text"] = " ".join(batch["text"].split("\xa0"))
    return batch

Advanced Usage

The following bash script was used to train the model.

./run_clm_flax.py     --output_dir="${MODEL_DIR}"     --model_type="gpt2"     --config_name="${MODEL_DIR}"     --tokenizer_name="${MODEL_DIR}"     --dataset_name="wiki40b"     --dataset_config_name="sv"     --do_train --do_eval     --block_size="512"     --per_device_train_batch_size="64"     --per_device_eval_batch_size="64"     --learning_rate="5e-3" --warmup_steps="1000"     --adam_beta1="0.9" --adam_beta2="0.98" --weight_decay="0.01"     --overwrite_output_dir     --num_train_epochs="20"     --logging_steps="500"     --save_steps="1000"     --eval_steps="2500"     --push_to_hub

📚 Documentation

Model Series

This model is part of a series of models trained on TPU with Flax Jax during the Huggingface Flax/Jax challenge.

Related Models

Data cleaning and preprocessing

The data was cleaned and preprocessed using the provided script. Make sure to install dependencies for beam_runner to make the dataset work.

Training script

The provided bash script was used to train the model.

🔧 Technical Details

The model is trained using the Flax CLM pipeline on the Swedish part of the wiki40b dataset. The data cleaning and preprocessing steps are crucial for preparing the dataset for training. The training script uses specific hyperparameters such as learning rate, batch size, and number of epochs.

📄 License

The README doesn't provide license information, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご