PassGPT-16Characters Open-Source Password Model - Freely Generate and Analyze Passwords to Protect Your Account Security

Home

Passgpt 16characters

Developed by javirandor

A causal language model trained on password leak data for generating and analyzing passwords

Large Language Model

Transformers

#Password leak data training #16-character password generation #Non-commercial research only

Downloads 455

Release Time : 6/15/2023

Model Overview

PassGPT is a causal language model trained on password leak data, primarily used for password generation and analysis. The model was trained on passwords up to 16 characters long, filtered from the RockYou leak incident, with slightly improved performance after optimization.

Model Features

Optimized password generation

The model has been optimized with a refined vocabulary of the most meaningful character sets and fine-tuned during training, resulting in slightly improved generation performance.

Dedicated for security research

The model is for research purposes only, licensed under CC BY NC 4.0 (non-commercial use only), and strictly prohibited from being used to attack real systems.

Customized tokenizer

Uses a customized BertTokenizer to encode password characters individually as independent tokens, improving password generation efficiency.

Model Capabilities

Password generation

Password analysis

Use Cases

Cybersecurity research

Password strength analysis

Use the model to generate password samples and analyze common password patterns and strengths.

Password policy evaluation

Evaluate the effectiveness of different password policies by generating passwords with the model.

🚀 PassGPT

PassGPT is a causal language model trained on password leaks, aiming to contribute to password - related research in the field of cybersecurity.

🚀 Quick Start

PassGPT is a causal language model trained on password leaks. It was first introduced in this paper. This version of the model was trained on passwords from the RockYou leak, after filtering those that were at most 16 characters long. You can also access PassGPT trained on passwords up to 10 characters long, without restrictions here.

✨ Features

Curated Model: This is a curated version of the model reported in the paper. Vocabulary size was reduced to the most meaningful characters and training was slightly optimized. Results are slightly better with these architectures.
Inherited Architecture: The model inherits the GPT2LMHeadModel architecture and implements a custom BertTokenizer that encodes each character in a password as a single token, avoiding merges.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

Passwords can be sampled from the model using the built - in generation methods provided by HuggingFace and using the "start of password token" as seed (i.e. <s>). This code can be used to generate one password with PassGPT. Note you may need to generate an [access token](https://huggingface.co/docs/hub/security - tokens) to authenticate your download.

from transformers import GPT2LMHeadModel
from transformers import RobertaTokenizerFast

tokenizer = RobertaTokenizerFast.from_pretrained("javirandor/passgpt-16characters",
                                                  use_auth_token="YOUR_ACCESS_TOKEN",
                                                  max_len=18,
                                                  padding="max_length", 
                                                  truncation=True,
                                                  do_lower_case=False,
                                                  strip_accents=False,
                                                  mask_token="<mask>",
                                                  unk_token="<unk>",
                                                  pad_token="<pad>",
                                                  truncation_side="right")

model = GPT2LMHeadModel.from_pretrained("javirandor/passgpt-16characters", use_auth_token="YOUR_ACCESS_TOKEN").eval()

NUM_GENERATIONS = 1

# Generate passwords sampling from the beginning of password token
g = model.generate(torch.tensor([[tokenizer.bos_token_id]]),
                  do_sample=True,
                  num_return_sequences=NUM_GENERATIONS,
                  max_length=18,
                  pad_token_id=tokenizer.pad_token_id,
                  bad_words_ids=[[tokenizer.bos_token_id]])

# Remove start of sentence token
g = g[:, 1:]

decoded = tokenizer.batch_decode(g.tolist())
decoded_clean = [i.split("</s>")[0] for i in decoded] # Get content before end of password token

# Print your sampled passwords!
print(decoded_clean)

Advanced Usage

You can find a more flexible script for sampling here.

📚 Documentation

Model description

The model inherits the GPT2LMHeadModel architecture and implements a custom BertTokenizer that encodes each character in a password as a single token, avoiding merges. It was trained from a random initialization, and the code for training can be found in the official repository.

Password Generation

Passwords can be sampled from the model using the built - in generation methods provided by HuggingFace and using the "start of password token" as seed (i.e. <s>).

Usage and License Notices

PassGPT is intended and licensed for research use only. The model and code are CC BY NC 4.0 (allowing only non - commercial use) and should not be used outside of research purposes. This model should never be used to attack real systems. Access will be granted upon request. Please, make sure to indicate the details and scope of your project.

Cite our work

@article{rando2023passgpt,
  title={PassGPT: Password Modeling and (Guided) Generation with Large Language Models},
  author={Rando, Javier and Perez - Cruz, Fernando and Hitaj, Briland},
  journal={arXiv preprint arXiv:2306.01545},
  year={2023}
}

Additional Information

Property	Details
Model Type	Causal language model
Training Data	Passwords from the RockYou leak (filtered passwords at most 16 characters long, also available for passwords up to 10 characters long)

⚠️ Important Note

PassGPT is intended and licensed for research use only. The model and code are CC BY NC 4.0 (allowing only non - commercial use) and should not be used outside of research purposes. This model should never be used to attack real systems. Access will be granted upon request. Please, make sure to indicate the details and scope of your project.

💡 Usage Tip

You may need to generate an [access token](https://huggingface.co/docs/hub/security - tokens) to authenticate your download when using the model.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご