đ flan-t5-large-grammar-synthesis - GGUF
This repository provides GGUF files for flan-t5-large-grammar-synthesis, which can be used with Ollama, llama.cpp, or any other framework supporting t5 models in GGUF format.
This repo mainly contains 'higher precision' and larger quantizations. Since the model is designed for grammar and spelling correction, low - precision versions may produce incorrect fixes and are thus less useful. For more details, refer to the original repository.
đ Quick Start
⨠Features
- Offers GGUF files for the flan - t5 - large - grammar - synthesis model.
- Supports use with multiple frameworks like Ollama, llama.cpp, and those compatible with t5 models in GGUF format.
- Focuses on higher - precision quantizations for reliable grammar and spelling correction.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
You can use the GGUFs with llamafile (or llama - cli) as follows:
llamafile.exe -m grammar-synthesis-Q6_K.gguf --temp 0 -p "There car broke down so their hitching a ride to they're class."
The output will be the corrected text:
system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.000
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 8192, n_batch = 2048, n_predict = -1, n_keep = 0
The car broke down so they had to take a ride to school. [end of text]
llama_print_timings: load time = 782.21 ms
llama_print_timings: sample time = 0.23 ms / 16 runs ( 0.01 ms per token, 68376.07 tokens per second)
llama_print_timings: prompt eval time = 85.08 ms / 19 tokens ( 4.48 ms per token, 223.33 tokens per second)
llama_print_timings: eval time = 341.74 ms / 15 runs ( 22.78 ms per token, 43.89 tokens per second)
llama_print_timings: total time = 456.56 ms / 34 tokens
Log end
Advanced Usage
If you have a GPU, add -ngl 9999
to your command to automatically place as many layers as the GPU can handle for faster inference.
đ Documentation
Refer to the original repo for more details.
đ License
This project is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
flan - t5 - large - grammar - synthesis |
Pipeline Tag |
text2text - generation |
Tags |
grammar, spelling |
Base Model |
pszemraj/flan - t5 - large - grammar - synthesis |
License |
Apache - 2.0 |