Finance-chat-GGUF Open-source Financial Chat Model - Optimizing Domain Knowledge to Enhance Financial Communication

Finance Chat GGUF

Developed by andrijdavid

A financial domain-specific chat model developed based on LLaMA-2-Chat-7B, optimized for domain knowledge through reading comprehension methods

Large Language Model English#Financial Q&A #Reading Comprehension Fine-tuning #Multi-turn Dialogue

Downloads 255

Release Time : 1/2/2024

Model Overview

This model is a chat model optimized for the financial domain, capable of handling financial-related Q&A and dialogue tasks. It enhances financial domain knowledge through continued pre-training on financial corpora using reading comprehension methods.

Model Features

Domain Optimization

Continued pre-training on financial corpora using reading comprehension methods significantly enhances financial domain knowledge capabilities.

Efficient Prompting

The optimized model excels in financial domain tasks, competing with larger-scale domain-specific models.

Multi-turn Dialogue Support

Supports LLaMA-2-Chat data format, enabling smooth multi-turn dialogues.

Model Capabilities

Financial domain Q&A

Financial text generation

Financial data analysis

Multi-turn dialogue

Use Cases

Financial Services

Financial Product Consultation

Answer professional questions about financial products and services

Provide accurate and professional financial product information

Market Analysis

Analyze financial market data and trends

Generate professional and reliable market analysis reports

Financial Education

Financial Knowledge Q&A

Answer professional questions from financial learners

Provide clear and easy-to-understand explanations of financial knowledge

🚀 finance-chat-GGUF

This repo contains GGUF format model files for financial chat, offering high - performance text generation capabilities in the finance domain.

🚀 Quick Start

This repository provides GGUF format model files for [finance - chat](https://huggingface.co/AdaptLLM/finance - chat). To get started, you can choose a suitable way to download and run the model according to your needs.

✨ Features

Multiple Client Support: Compatible with a wide range of clients and libraries such as llama.cpp, text - generation - webui, Ollama, etc.
Quantization Options: Offers various quantization methods to balance between model size and performance.
Easy Download: Can be downloaded automatically by some clients or manually using the huggingface - hub library.

📦 Installation

Downloading GGUF Files

⚠️ Important Note

You almost never want to clone the entire repo! Multiple different quantisation formats are provided, and most users only want to pick and download a single file.

Some clients/libraries like LM Studio, LoLLMS Web UI, and Faraday.dev will automatically download models for you.

In `text - generation - webui`

Under Download Model, enter the model repo: andrijdavid/finance - chat - GGUF and below it, a specific filename to download, such as finance - chat - f16.gguf, then click Download.

On the command line

First, install the huggingface - hub Python library:

pip3 install huggingface - hub

Then download an individual model file to the current directory:

huggingface - cli download andrijdavid/finance - chat - GGUF finance - chat - f16.gguf --local - dir. --local - dir - use - symlinks False

You can also download multiple files at once with a pattern:

huggingface - cli download andrijdavid/finance - chat - GGUF --local - dir. --local - dir - use - symlinks False --include='*Q4_K*gguf'

To accelerate downloads on fast connections (1Gbit/s or higher), install hf_transfer:

pip3 install hf_transfer

And set environment variable HF_HUB_ENABLE_HF_TRANSFER to 1:

HF_HUB_ENABLE_HF_TRANSFER = 1 huggingface - cli download andrijdavid/finance - chat - GGUF finance - chat - f16.gguf --local - dir. --local - dir - use - symlinks False

💻 Usage Examples

Basic Usage with `llama.cpp`

Make sure you are using llama.cpp from commit d0cee0d or later.

./main -ngl 35 -m finance - chat - f16.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<PROMPT>"

Change -ngl 32 to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
Change -c 4096 to the desired sequence length.

If you want to have a chat - style conversation, replace the -p <PROMPT> argument with -i -ins.

Using in `text - generation - webui`

Further instructions can be found in the text - generation - webui documentation, here: [text - generation - webui/docs/04 ‐ Model Tab.md](https://github.com/oobabooga/text - generation - webui/blob/main/docs/04%20%E2%80%90%20Model%20Tab.md#llamacpp).

Using from Python code

You can use GGUF models from Python using the [llama - cpp - python](https://github.com/abetlen/llama - cpp - python) library.

Install the package

# Base ctransformers with no GPU acceleration
pip install llama - cpp - python
# With NVidia CUDA acceleration
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama - cpp - python
# Or with OpenBLAS acceleration
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama - cpp - python
# Or with CLBLast acceleration
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama - cpp - python
# Or with AMD ROCm GPU acceleration (Linux only)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama - cpp - python
# Or with Metal GPU acceleration for macOS systems only
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama - cpp - python
# In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
pip install llama - cpp - python

Simple llama - cpp - python example code

from llama_cpp import Llama
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="./finance - chat - f16.gguf",  # Download the model file first
  n_ctx=32768,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)
# Simple inference example
output = llm(
  "<PROMPT>", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)
# Chat Completion API
llm = Llama(model_path="./finance - chat - f16.gguf", chat_format="llama - 2")  # Set chat_format according to the model you are using
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)

Using with LangChain

Here are guides on using llama - cpp - python and ctransformers with LangChain:

📚 Documentation

About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

Here is an incomplete list of clients and libraries that are known to support GGUF:

llama.cpp. This is the source project for GGUF, providing both a Command Line Interface (CLI) and a server option.
[text - generation - webui](https://github.com/oobabooga/text - generation - webui), Known as the most widely used web UI, this project boasts numerous features and powerful extensions, and supports GPU acceleration.
Ollama Ollama is a lightweight and extensible framework designed for building and running language models locally. It features a simple API for creating, managing, and executing models, along with a library of pre - built models for use in various applications
KoboldCpp, A comprehensive web UI offering GPU acceleration across all platforms and architectures, particularly renowned for storytelling.
GPT4All, This is a free and open source GUI that runs locally, supporting Windows, Linux, and macOS with full GPU acceleration.
LM Studio An intuitive and powerful local GUI for Windows and macOS (Silicon), featuring GPU acceleration.
[LoLLMS Web UI](https://github.com/ParisNeo/lollms - webui). A notable web UI with a variety of unique features, including a comprehensive model library for easy model selection.
Faraday.dev, An attractive, user - friendly character - based chat GUI for Windows and macOS (both Silicon and Intel), also offering GPU acceleration.
[llama - cpp - python](https://github.com/abetlen/llama - cpp - python), A Python library equipped with GPU acceleration, LangChain support, and an OpenAI - compatible API server.
candle, A Rust - based ML framework focusing on performance, including GPU support, and designed for ease of use.
ctransformers, A Python library featuring GPU acceleration, LangChain support, and an OpenAI - compatible AI server.
localGPT An open - source initiative enabling private conversations with documents.

Explanation of Quantisation Methods

Click to see details

The new methods available are:

GGML_TYPE_Q2_K - "type - 1" 2 - bit quantization in super - blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
GGML_TYPE_Q3_K - "type - 0" 3 - bit quantization in super - blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
GGML_TYPE_Q4_K - "type - 1" 4 - bit quantization in super - blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
GGML_TYPE_Q5_K - "type - 1" 5 - bit quantization. Same super - block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
GGML_TYPE_Q6_K - "type - 0" 6 - bit quantization. Super - blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw.

🔧 Technical Details

This domain - specific chat model is developed from LLaMA - 2 - Chat - 7B, using the method in the paper Adapting Large Language Models via Reading Comprehension. The team explores continued pre - training on domain - specific corpora for large language models. By transforming large - scale pre - training corpora into reading comprehension texts, it improves the prompting performance across tasks in biomedicine, finance, and law domains. Their 7B model competes with much larger domain - specific models like BloombergGPT - 50B.

Updates

2024/1/16: 🎉 Their research paper has been accepted by ICLR 2024!!!🎉
2023/12/19: Released their [13B base models](https://huggingface.co/AdaptLLM/law - LLM - 13B) developed from LLaMA - 1 - 13B.
2023/12/8: Released their [chat models](https://huggingface.co/AdaptLLM/law - chat) developed from LLaMA - 2 - Chat - 7B.
2023/9/18: Released their paper, code, [data](https://huggingface.co/datasets/AdaptLLM/law - tasks), and [base models](https://huggingface.co/AdaptLLM/law - LLM) developed from LLaMA - 1 - 7B.

Domain - Specific LLaMA - 1

LLaMA - 1 - 7B

In the paper, three domain - specific models are developed from LLaMA - 1 - 7B, which are also available in Huggingface: [Biomedicine - LLM](https://huggingface.co/AdaptLLM/medicine - LLM), [Finance - LLM](https://huggingface.co/AdaptLLM/finance - LLM) and [Law - LLM](https://huggingface.co/AdaptLLM/law - LLM). The performances of AdaptLLM compared to other domain - specific LLMs are shown in the following image:

LLaMA - 1 - 13B

Moreover, the base model is scaled up to LLaMA - 1 - 13B to see if the method is similarly effective for larger models.

📄 License

The model uses the llama2 license.

Property	Details
Model Type	GGUF format model for finance - chat
Training Data	Open - Orca/OpenOrca, GAIR/lima, WizardLM/WizardLM_evol_instruct_V2_196k
Metrics	accuracy
Pipeline Tag	text - generation
Quantized By	andrijdavid
Tags	finance, GGUF
License	llama2

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご