flan-t5-small-keywords Open-source Keyword Extraction Model - Easily Extract Keywords Precisely from Paragraphs

Flan T5 Small Keywords

Developed by agentlans

A fine-tuned keyword extraction model based on Flan-T5 small version, specifically designed to extract keywords from paragraphs

Large Language Model

Transformers

EnglishOpen Source License:MIT #English Keyword Extraction #Text Summarization Optimization #SEO Assistant Tool

Downloads 1,101

Release Time : 9/10/2024

Model Overview

This model leverages the powerful capabilities of the T5 architecture to identify and output key phrases that summarize the core content of the text, suitable for long-text summarization, generating article tags, and identifying core document themes

Model Features

Paragraph Keyword Extraction

Capable of accurately extracting keywords or key phrases that summarize the core content from long paragraphs

Multi-Purpose Application

Suitable for various scenarios such as text summarization, tag generation, and SEO keyword identification

Based on Flan-T5 Architecture

Utilizes the powerful capabilities of the T5 architecture for sequence-to-sequence task processing

Model Capabilities

Text Keyword Extraction

Long-Text Summarization

Key Phrase Generation

Use Cases

Content Management

Article Tag Generation

Automatically generate tags for blogs or articles

Improves content categorization and retrieval efficiency

Metadata Generation

Generate metadata for content management systems

Simplifies content management processes

SEO Optimization

SEO Keyword Identification

Identify core keywords in documents for SEO optimization

Improves webpage search rankings

🚀 Keyword Extraction Model

This is a keyword extraction model based on the fine - tuned Flan - T5 architecture. It can effectively extract key phrases from paragraphs, helping users quickly summarize text, generate tags, and identify main themes.

🚀 Quick Start

The model is a fine - tuned version of the [Flan - T5 small](https://huggingface.co/google/flan - t5 - small) model, specifically designed for extracting keywords from paragraphs. It leverages the T5 architecture to identify and output key phrases that capture the essence of the input text.

✨ Features

Text Summarization: Summarize long texts by extracting key phrases.
Tag Generation: Generate tags for articles or blog posts.
Theme Identification: Identify main themes in documents.

📦 Installation

The installation mainly involves using the transformers library. You can install it via the following command:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "agentlans/flan-t5-small-keywords"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

input_text = "Your paragraph here..."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Process the output to get a list of keywords (split and remove duplicates)
keywords = list(set(decoded_output.split('||')))
print(keywords)

Example Input and Output

Example input paragraph:

In the heart of the bustling city, a hidden gem awaits discovery: a quaint little bookstore that seems to have escaped the relentless march of time. As you step inside, the scent of aged paper and rich coffee envelops you, creating an inviting atmosphere that beckons you to explore its shelves. Each corner is adorned with carefully curated collections, from classic literature to contemporary bestsellers, inviting readers of all tastes to lose themselves in the pages of a good book. The soft glow of warm lighting casts a cozy ambiance, while the gentle hum of conversation among fellow book lovers adds to the charm. This bookstore is not just a place to buy books; it's a sanctuary for those seeking solace, inspiration, and a sense of community in the fast - paced world outside.

Example output keywords: ['old paper coffee scent', 'cosy hum of conversation', 'quaint bookstore', 'community in the fast - paced world', 'solace inspiration', 'curated collections']

📚 Documentation

Intended Uses & Limitations

Intended Uses:

Quick summarization of long paragraphs.
Generating metadata for content management systems.
Assisting in SEO keyword identification.

Limitations:

The model may sometimes generate irrelevant keywords.
Performance may vary depending on the length and complexity of the input text.
- For best results, use long clean texts.
- Length limit is 512 tokens due to Flan - T5 architecture.
The model is trained on English text and may not perform well on other languages.

Training and Evaluation

The model was fine - tuned on a dataset of English Wikipedia paragraphs and their corresponding keywords, which includes a diverse range of topics to ensure broad applicability.

Limitations and Bias

This model has been trained on English Wikipedia paragraphs, which may introduce biases. Users should be aware that the keywords generated might reflect these biases and should use the output judiciously.

Ethical Considerations

When using this model, consider the potential impact of automated keyword extraction on content creation and SEO practices. Ensure that the use of this model complies with relevant guidelines and does not contribute to the creation of misleading or spammy content.

🔧 Technical Details

Training Details

Training Data: dataset of Wikipedia paragraphs and keywords
Training Procedure: Fine - tuning of google/flan - t5 - small

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 10.0

Framework versions

Transformers 4.45.1
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.20.0

📄 License

This project is licensed under the MIT license.

Information Table

Property	Details
Model Type	Fine - tuned Flan - T5 small for keyword extraction
Training Data	Dataset of English Wikipedia paragraphs and their corresponding keywords
Base Model	google/flan - t5 - small
Library Name	transformers
Tags	keyword - extraction, text - summarization, flan - t5
License	MIT
Datasets	agentlans/wikipedia - paragraph - keywords

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご