đ GPT2-svenska-wikipedia
This is a Swedish GPT2-style model. It was trained using the Flax CLM pipeline on the Swedish part of the wiki40b dataset.
đ Quick Start
This Swedish GPT2-style model offers a powerful solution for Swedish language processing tasks. It's trained on the Swedish part of the wiki40b dataset, which can be found at wiki40b dataset.
⨠Features
- Model Series: This model is part of a series of models trained on TPU with Flax Jax during the Huggingface Flax/Jax challenge.
- Multiple Model Variants: There are various related models including different GPT and Roberta models for different Nordic languages.
đĻ Installation
The README doesn't provide specific installation steps, so this section is skipped.
đģ Usage Examples
Basic Usage
The following Python script shows how to load and clean the dataset used for training the model.
from datasets import load_dataset
def load_and_clean_wiki():
dataset = load_dataset('wiki40b', 'sv', beam_runner='DirectRunner', split="train")
dataset = dataset.remove_columns(['wikidata_id', 'version_id'])
filtered_dataset = dataset.map(filter_wikipedia)
return filtered_dataset
def filter_wikipedia(batch):
batch["text"] = " ".join(batch["text"].split("\
_START_SECTION_\
"))
batch["text"] = " ".join(batch["text"].split("\
_START_ARTICLE_\
"))
batch["text"] = " ".join(batch["text"].split("\
_START_ARTICLE_\
"))
batch["text"] = " ".join(batch["text"].split("\
_START_PARAGRAPH_\
"))
batch["text"] = " ".join(batch["text"].split("_NEWLINE_"))
batch["text"] = " ".join(batch["text"].split("\xa0"))
return batch
Advanced Usage
The following bash script was used to train the model.
./run_clm_flax.py --output_dir="${MODEL_DIR}" --model_type="gpt2" --config_name="${MODEL_DIR}" --tokenizer_name="${MODEL_DIR}" --dataset_name="wiki40b" --dataset_config_name="sv" --do_train --do_eval --block_size="512" --per_device_train_batch_size="64" --per_device_eval_batch_size="64" --learning_rate="5e-3" --warmup_steps="1000" --adam_beta1="0.9" --adam_beta2="0.98" --weight_decay="0.01" --overwrite_output_dir --num_train_epochs="20" --logging_steps="500" --save_steps="1000" --eval_steps="2500" --push_to_hub
đ Documentation
Model Series
This model is part of a series of models trained on TPU with Flax Jax during the Huggingface Flax/Jax challenge.
Related Models
Data cleaning and preprocessing
The data was cleaned and preprocessed using the provided script. Make sure to install dependencies for beam_runner
to make the dataset work.
Training script
The provided bash script was used to train the model.
đ§ Technical Details
The model is trained using the Flax CLM pipeline on the Swedish part of the wiki40b dataset. The data cleaning and preprocessing steps are crucial for preparing the dataset for training. The training script uses specific hyperparameters such as learning rate, batch size, and number of epochs.
đ License
The README doesn't provide license information, so this section is skipped.