Storytime-13B-GPTQ Open-source Large Language Model - Free Deployment to Boost High-quality Chinese Story Creation

Storytime 13B GPTQ

Developed by TheBloke

Storytime 13B is a large language model based on the LLaMA architecture, specializing in Chinese text generation tasks, particularly excelling in story creation.

Large Language Model

Transformers

English#Story Generation #Long Text Processing #Chinese Support

Downloads 134

Release Time : 9/23/2023

Model Overview

Developed by Charles Goddard, this model is based on the LLaMA architecture and optimized for Chinese text generation. It uses Alpaca-style prompt templates and is well-suited for creative writing and story generation tasks.

Model Features

Chinese Optimization

Specifically optimized for Chinese text generation

Story Creation

Particularly suitable for creative writing and story generation tasks

Alpaca Prompt Template

Uses standardized Alpaca-style prompt templates for ease of use

Multiple Quantized Versions

Offers various GPTQ quantized versions to accommodate different hardware requirements

Model Capabilities

Chinese Text Generation

Story Creation

Instruction Following

Creative Writing

Use Cases

Content Creation

Story Generation

Generates complete stories based on user-provided prompts

Capable of producing coherent and creative storylines

Creative Writing Assistance

Helps writers overcome writer's block and provides creative inspiration

Offers diverse writing ideas and plot development suggestions

Education

Writing Instruction

Used for writing exercises in language learning

Helps students practice Chinese writing skills

🚀 Storytime 13B - GPTQ

This repository contains GPTQ model files for Charles Goddard's Storytime 13B, offering multiple quantisation parameter options to suit different hardware and requirements.

Chat & support: TheBloke's Discord server

Want to contribute? TheBloke's Patreon page

TheBloke's LLM work is generously supported by a grant from andreessen horowitz (a16z)

✨ Features

Multiple Quantisation Options: Different GPTQ parameter permutations are provided to meet various hardware and performance needs.
Diverse Repositories: Available in AWQ, GPTQ, GGUF formats, as well as the original unquantised fp16 model.
Alpaca Prompt Template: Uses the Alpaca prompt template for easy interaction.

📦 Installation

In text - generation - webui

To download from the main branch, enter TheBloke/storytime-13B-GPTQ in the "Download model" box. To download from another branch, add :branchname to the end of the download name, e.g., TheBloke/storytime-13B-GPTQ:gptq-4-32g-actorder_True

From the command line

I recommend using the huggingface - hub Python library:

pip3 install huggingface - hub

To download the main branch to a folder called storytime-13B-GPTQ:

mkdir storytime-13B-GPTQ
huggingface-cli download TheBloke/storytime-13B-GPTQ --local-dir storytime-13B-GPTQ --local-dir-use-symlinks False

To download from a different branch, add the --revision parameter:

mkdir storytime-13B-GPTQ
huggingface-cli download TheBloke/storytime-13B-GPTQ --revision gptq-4-32g-actorder_True --local-dir storytime-13B-GPTQ --local-dir-use-symlinks False

With `git` (not recommended)

To clone a specific branch with git, use a command like this:

git clone --single-branch --branch gptq-4-32g-actorder_True https://huggingface.co/TheBloke/storytime-13B-GPTQ

💻 Usage Examples

How to easily download and use this model in [text - generation - webui](https://github.com/oobabooga/text - generation - webui)

Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/storytime-13B-GPTQ.
- To download from a specific branch, enter for example TheBloke/storytime-13B-GPTQ:gptq-4-32g-actorder_True
- see Provided Files below for the list of branches for each option.
Click Download.
The model will start downloading. Once it's finished it will say "Done".
In the top left, click the refresh icon next to Model.
In the Model dropdown, choose the model you just downloaded: storytime-13B-GPTQ
The model will automatically load, and is now ready for use!
If you want any custom settings, set them and then click Save settings for this model followed by Reload the Model in the top right.
- Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file quantize_config.json.
Once you're ready, click the Text Generation tab and enter a prompt to get started!

How to use this GPTQ model from Python code

Install the necessary packages

Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.

pip3 install transformers optimum
pip3 install auto - gptq --extra - index - url https://huggingface.github.io/autogptq - index/whl/cu118/  # Use cu117 if on CUDA 11.7

If you have problems installing AutoGPTQ using the pre - built wheels, install it from source instead:

pip3 uninstall - y auto - gptq
git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
git checkout v0.4.2
pip3 install .

You can then use the following code

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/storytime-13B-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=False,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

# Create a text generation pipeline
generate_text = pipeline('text-generation',
                         model=model,
                         tokenizer=tokenizer,
                         device_map="auto")

# Generate text
prompt = "Once upon a time"
output = generate_text(prompt, max_length=200, num_return_sequences=1)
print(output[0]['generated_text'])

📚 Documentation

Model Information

Model creator: Charles Goddard
Original model: Storytime 13B

Repositories available

Prompt template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Provided files, and GPTQ parameters

Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements. Each separate quant is in a different branch. See below for instructions on fetching from different branches. All recent GPTQ files are made with AutoGPTQ, and all files in non - main branches are made with AutoGPTQ. Files in the main branch which were uploaded before August 2023 were made with GPTQ - for - LLaMa.

Explanation of GPTQ parameters

Bits: The bit size of the quantised model.
GS: GPTQ group size. Higher numbers use less VRAM, but have lower quantisation accuracy. "None" is the lowest possible value.
Act Order: True or False. Also known as desc_act. True results in better quantisation accuracy. Some GPTQ clients have had issues with models that use Act Order plus Group Size, but this is generally resolved now.
Damp %: A GPTQ parameter that affects how samples are processed for quantisation. 0.01 is default, but 0.1 results in slightly better accuracy.
GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy. Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
Sequence Length: The length of the dataset sequences used for quantisation. Ideally this is the same as the model sequence length. For some very long sequence models (16+K), a lower sequence length may have to be used. Note that a lower sequence length does not limit the sequence length of the quantised model. It only impacts the quantisation accuracy on longer inference sequences.
ExLlama Compatibility: Whether this file can be loaded with ExLlama, which currently only supports Llama models in 4 - bit.

Branch	Bits	GS	Act Order	Damp %	GPTQ Dataset	Seq Len	Size	ExLlama	Desc
main	4	128	Yes	0.1	wikitext	4096	7.26 GB	Yes	4 - bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy.
gptq-4-32g-actorder_True	4	32	Yes	0.1	wikitext	4096	8.00 GB	Yes	4 - bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage.
gptq-8--1g-actorder_True	8	None	Yes	0.1	wikitext	4096	13.36 GB	No	8 - bit, with Act Order. No group size, to lower VRAM requirements.
gptq-8-128g-actorder_True	8	128	Yes	0.1	wikitext	4096	13.65 GB	No	8 - bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy.
gptq-8-32g-actorder_True	8	32	Yes	0.1	wikitext	4096	14.54 GB	No	8 - bit, with group size 32g and Act Order for maximum inference quality.
gptq-4-64g-actorder_True	4	64	Yes	0.1	wikitext	4096	7.51 GB	Yes	4 - bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy.

📄 License

The model is under the Llama 2 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご