Hebrew-GPT Neo XL Poetry Open-Source Text Model - Generate High-Quality Hebrew Poetry for Free

Hebrew Gpt Neo Xl Poetry

Developed by Norod78

This is a text generation model specifically fine-tuned for generating Hebrew poetry, based on hebrew-gpt_neo-xl

Text Generation OtherOpen Source License:MIT #Hebrew poetry generation #Literary creation assistance #Multi-line text continuation

Downloads 15

Release Time : 3/2/2022

Model Overview

The model is fine-tuned specifically for creating Hebrew poetry, trained on a collection of various Hebrew books, magazines, and poetry anthologies

Model Features

Poetry-specific fine-tuning

Optimized and fine-tuned specifically for Hebrew poetry creation

Multi-source training data

Incorporates training data from various sources including Hebrew books, magazines, and poetry anthologies

Large context support

Supports a context window of up to 2048 tokens

Model Capabilities

Hebrew poetry generation

Creative text writing

Context-aware text completion

Use Cases

Literary creation

Poetry creation

Generate Hebrew poetry and verses

Can produce literary Hebrew poetry works

Creative writing

Assist in Hebrew creative writing

Provides creative text inspiration and content expansion

Education

Language learning

Assist in learning Hebrew poetry and literature

Provides rich Hebrew language learning materials

🚀 Hebrew GPT Neo XL Poetry

A Hebrew poetry text generation model fine - tuned on [hebrew - gpt_neo - xl](https://huggingface.co/Norod78/hebrew - gpt_neo - xl), designed to generate beautiful Hebrew poetry.

✨ Features

Fine - tuned on a diverse set of Hebrew literary resources for high - quality poetry generation.
Easy - to - use with provided sample code and Google Colab notebook.

📦 Installation

The installation steps are included in the sample code. You can install the necessary libraries using the following command:

!pip install tokenizers==0.10.3 transformers==4.8.0

💻 Usage Examples

Basic Usage

# Simple usage sample code
!pip install tokenizers==0.10.3 transformers==4.8.0

from transformers import AutoTokenizer, AutoModelForCausalLM
  
tokenizer = AutoTokenizer.from_pretrained("Norod78/hebrew-gpt_neo-xl-poetry")
model = AutoModelForCausalLM.from_pretrained("Norod78/hebrew-gpt_neo-xl-poetry", pad_token_id=tokenizer.eos_token_id)

prompt_text = "אני אוהב שוקולד ועוגות"
max_len = 512
sample_output_num = 3
seed = 1000

import numpy as np
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = 0 if torch.cuda.is_available()==False else torch.cuda.device_count()

print(f"device: {device}, n_gpu: {n_gpu}")

np.random.seed(seed)
torch.manual_seed(seed)
if n_gpu > 0:
    torch.cuda.manual_seed_all(seed)

model.to(device)

encoded_prompt = tokenizer.encode(
    prompt_text, add_special_tokens=False, return_tensors="pt")

encoded_prompt = encoded_prompt.to(device)

if encoded_prompt.size()[-1] == 0:
        input_ids = None
else:
        input_ids = encoded_prompt

print("input_ids = " + str(input_ids))

if input_ids != None:
  max_len += len(encoded_prompt[0])
  if max_len > 2048:
    max_len = 2048

print("Updated max_len = " + str(max_len))

stop_token = "<|endoftext|>"
new_lines = "\n\n\n"

sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=max_len, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=sample_output_num
)

print(100 * '-' + "\n\t\tOutput\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):

  text = tokenizer.decode(sample_output, skip_special_tokens=True)
  
  # Remove all text after the stop token
  text = text[: text.find(stop_token) if stop_token else None]

  # Remove all text after 3 newlines
  text = text[: text.find(new_lines) if new_lines else None]

  print("\n{}: {}".format(i, text))
  print("\n" + 100 * '-')

Advanced Usage

You can adjust the parameters in the model.generate function according to your specific needs, such as max_length, top_k, top_p, etc., to get different generation results.

📚 Documentation

Datasets

The model is trained on an assortment of various Hebrew books, magazines, and poetry corpuses.

Training Config

The training configuration is similar to this one.

Google Colab Notebook

You can use the model conveniently through the Google Colab Notebook available here.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご