đ Hebrew GPT Neo XL Poetry
A Hebrew poetry text generation model fine - tuned on [hebrew - gpt_neo - xl](https://huggingface.co/Norod78/hebrew - gpt_neo - xl), designed to generate beautiful Hebrew poetry.
⨠Features
- Fine - tuned on a diverse set of Hebrew literary resources for high - quality poetry generation.
- Easy - to - use with provided sample code and Google Colab notebook.
đĻ Installation
The installation steps are included in the sample code. You can install the necessary libraries using the following command:
!pip install tokenizers==0.10.3 transformers==4.8.0
đģ Usage Examples
Basic Usage
!pip install tokenizers==0.10.3 transformers==4.8.0
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Norod78/hebrew-gpt_neo-xl-poetry")
model = AutoModelForCausalLM.from_pretrained("Norod78/hebrew-gpt_neo-xl-poetry", pad_token_id=tokenizer.eos_token_id)
prompt_text = "×× × ×××× ×Š××§××× ××ĸ××××Ē"
max_len = 512
sample_output_num = 3
seed = 1000
import numpy as np
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = 0 if torch.cuda.is_available()==False else torch.cuda.device_count()
print(f"device: {device}, n_gpu: {n_gpu}")
np.random.seed(seed)
torch.manual_seed(seed)
if n_gpu > 0:
torch.cuda.manual_seed_all(seed)
model.to(device)
encoded_prompt = tokenizer.encode(
prompt_text, add_special_tokens=False, return_tensors="pt")
encoded_prompt = encoded_prompt.to(device)
if encoded_prompt.size()[-1] == 0:
input_ids = None
else:
input_ids = encoded_prompt
print("input_ids = " + str(input_ids))
if input_ids != None:
max_len += len(encoded_prompt[0])
if max_len > 2048:
max_len = 2048
print("Updated max_len = " + str(max_len))
stop_token = "<|endoftext|>"
new_lines = "\n\n\n"
sample_outputs = model.generate(
input_ids,
do_sample=True,
max_length=max_len,
top_k=50,
top_p=0.95,
num_return_sequences=sample_output_num
)
print(100 * '-' + "\n\t\tOutput\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
text = tokenizer.decode(sample_output, skip_special_tokens=True)
text = text[: text.find(stop_token) if stop_token else None]
text = text[: text.find(new_lines) if new_lines else None]
print("\n{}: {}".format(i, text))
print("\n" + 100 * '-')
Advanced Usage
You can adjust the parameters in the model.generate
function according to your specific needs, such as max_length
, top_k
, top_p
, etc., to get different generation results.
đ Documentation
Datasets
The model is trained on an assortment of various Hebrew books, magazines, and poetry corpuses.
Training Config
The training configuration is similar to this one.
Google Colab Notebook
You can use the model conveniently through the Google Colab Notebook available here.
đ License
This project is licensed under the MIT license.