Fin-R1-GGUF Open-Source Financial Large Model - Free Deployment to Assist with Complex Financial Reasoning Tasks

Fin R1 GGUF

Developed by Mungert

Fin-R1 is a large language model specialized for complex financial reasoning, fine-tuned from Qwen2.5-7B-Instruct, demonstrating outstanding performance in financial benchmarks.

Large Language Model Open Source License:Apache-2.0 #Financial Reasoning Optimization #Low-Memory Quantization #BF16 Acceleration

Downloads 4,285

Release Time : 3/21/2025

Model Overview

A large language model designed for financial reasoning, optimized through reinforcement learning, suitable for core financial business scenarios such as banking, securities, and insurance.

Model Features

Financial Domain Optimization

Specially optimized for financial reasoning scenarios, equipped with robust financial calculation and analysis capabilities

Reinforcement Learning Training

Utilizes a two-stage training approach with SFT and RL to enhance the model's complex financial reasoning abilities

Lightweight Deployment

7B parameter architecture significantly reduces deployment costs

Multi-Format Support

Offers BF16, F16, and various quantization versions to meet different hardware requirements

Model Capabilities

Financial code generation

Financial calculations

English financial calculations

Financial security compliance analysis

Intelligent risk control

ESG analysis

Text generation

Use Cases

Banking Services

Loan Risk Assessment

Analyzes financial data of loan applicants to assess default risks

Securities Analysis

Portfolio Optimization

Generates optimized portfolio recommendations based on market data and investment objectives

Insurance Services

Premium Calculation

Calculates reasonable premiums based on policyholder information and risk factors

🚀 Fin-R1 GGUF Models

These models are designed for financial inference, offering various formats to suit different hardware and memory requirements.

✨ Features

Choosing the Right Model Format

Selecting the appropriate model format depends on your hardware capabilities and memory constraints.

BF16 (Brain Float 16) – Use if BF16 acceleration is available

A 16-bit floating-point format designed for faster computation while retaining good precision.
Provides similar dynamic range as FP32 but with lower memory usage.
Recommended if your hardware supports BF16 acceleration (check your device’s specs).
Ideal for high-performance inference with reduced memory footprint compared to FP32.

📌 Use BF16 if: ✔ Your hardware has native BF16 support (e.g., newer GPUs, TPUs). ✔ You want higher precision while saving memory. ✔ You plan to requantize the model into another format.

📌 Avoid BF16 if: ❌ Your hardware does not support BF16 (it may fall back to FP32 and run slower). ❌ You need compatibility with older devices that lack BF16 optimization.

F16 (Float 16) – More widely supported than BF16

A 16-bit floating-point format with high precision but a smaller range of values than BF16.
Works on most devices with FP16 acceleration support (including many GPUs and some CPUs).
Slightly lower numerical precision than BF16 but generally sufficient for inference.

📌 Use F16 if: ✔ Your hardware supports FP16 but not BF16. ✔ You need a balance between speed, memory usage, and accuracy. ✔ You are running on a GPU or another device optimized for FP16 computations.

📌 Avoid F16 if: ❌ Your device lacks native FP16 support (it may run slower than expected). ❌ You have memory limitations.

Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference

Quantization reduces model size and memory usage while maintaining as much accuracy as possible.

Lower-bit models (Q4_K) → Best for minimal memory usage, may have lower precision.
Higher-bit models (Q6_K, Q8_0) → Better accuracy, requires more memory.

📌 Use Quantized Models if: ✔ You are running inference on a CPU and need an optimized model. ✔ Your device has low VRAM and cannot load full-precision models. ✔ You want to reduce memory footprint while keeping reasonable accuracy.

📌 Avoid Quantized Models if: ❌ You need maximum accuracy (full-precision models are better for this). ❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).

Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)

These models are optimized for extreme memory efficiency, making them ideal for low-power devices or large-scale deployments where memory is a critical constraint.

IQ3_XS: Ultra-low-bit quantization (3-bit) with extreme memory efficiency.
- Use case: Best for ultra-low-memory devices where even Q4_K is too large.
- Trade-off: Lower accuracy compared to higher-bit quantizations.
IQ3_S: Small block size for maximum memory efficiency.
- Use case: Best for low-memory devices where IQ3_XS is too aggressive.
IQ3_M: Medium block size for better accuracy than IQ3_S.
- Use case: Suitable for low-memory devices where IQ3_S is too limiting.
Q4_K: 4-bit quantization with block-wise optimization for better accuracy.
- Use case: Best for low-memory devices where Q6_K is too large.
Q4_0: Pure 4-bit quantization, optimized for ARM devices.
- Use case: Best for ARM-based devices or low-memory environments.

Summary Table: Model Format Selection

Property	Details
Model Type	BF16, F16, Q4_K, Q6_K, Q8_0, IQ3_XS, Q4_0
Precision	Varies from very low (IQ3_XS) to high (BF16, F16, Q8_0)
Memory Usage	Ranges from very low (IQ3_XS) to high (BF16, F16)
Device Requirements	BF16-supported GPU/CPUs, FP16-supported devices, CPU or Low-VRAM devices, etc.
Best Use Case	High-speed inference with reduced memory, GPU inference when BF16 isn’t available, memory-constrained environments, etc.

📦 Included Files & Details

`Fin-R1-bf16.gguf`

Model weights preserved in BF16.
Use this if you want to requantize the model into a different format.
Best if your device supports BF16 acceleration.

`Fin-R1-f16.gguf`

Model weights stored in F16.
Use if your device supports FP16, especially if BF16 is not available.

`Fin-R1-bf16-q8_0.gguf`

Output & embeddings remain in BF16.
All other layers quantized to Q8_0.
Use if your device supports BF16 and you want a quantized version.

`Fin-R1-f16-q8_0.gguf`

Output & embeddings remain in F16.
All other layers quantized to Q8_0.

`Fin-R1-q4_k.gguf`

Output & embeddings quantized to Q8_0.
All other layers quantized to Q4_K.
Good for CPU inference with limited memory.

`Fin-R1-q4_k_s.gguf`

Smallest Q4_K variant, using less memory at the cost of accuracy.
Best for very low-memory setups.

`Fin-R1-q6_k.gguf`

Output & embeddings quantized to Q8_0.
All other layers quantized to Q6_K.

`Fin-R1-q8_0.gguf`

Fully Q8 quantized model for better accuracy.
Requires more memory but offers higher precision.

`Fin-R1-iq3_xs.gguf`

IQ3_XS quantization, optimized for extreme memory efficiency.
Best for ultra-low-memory devices.

`Fin-R1-iq3_m.gguf`

IQ3_M quantization, offering a medium block size for better accuracy.
Suitable for low-memory devices.

`Fin-R1-q4_0.gguf`

Pure Q4_0 quantization, optimized for ARM devices.
Best for low-memory environments.
Prefer IQ4_NL for better accuracy.

🚀 Quick Start

If you find these models useful, please click like ❤. Also, I’d really appreciate it if you could test my Network Monitor Assistant at 👉 Network Monitor Assitant.

💬 Click the chat icon (bottom right of the main and dashboard pages). Choose a LLM; toggle between the LLM Types TurboLLM -> FreeLLM -> TestLLM.

What I'm Testing

I'm experimenting with function calling against my network monitoring service. Using small open source models. I am into the question "How small can it go and still function".

🟡 TestLLM – Runs the current testing model using llama.cpp on 6 threads of a Cpu VM (Should take about 15s to load. Inference speed is quite slow and it only processes one user prompt at a time—still working on scaling!). If you're curious, I'd be happy to share how it works!

The other Available AI Assistants

🟢 TurboLLM – Uses gpt-4o-mini Fast! Note: tokens are limited since OpenAI models are pricey, but you can Login or Download the Free Network Monitor agent to get more tokens, Alternatively use the TestLLM.

🔵 HugLLM – Runs open-source Hugging Face models Fast, Runs small models (≈8B) hence lower quality, Get 2x more tokens (subject to Hugging Face API availability)

📚 Documentation

Overview

Fin-R1 is a large language model designed for complex financial reasoning, jointly developed and open-sourced by the Financial Large Language Model Research Group (SUFE-AIFLM-Lab) of the School of Statistics and Data Science at Shanghai University of Finance and Economics and Caiyue Xingchen. Based on Qwen2.5-7B-Instruct, the model is fine-tuned on high-quality verifiable financial questions, achieving SOTA performance on multiple financial benchmarks.

Code

GitHub Repository

Scenario Applications
Overall Workflow
Future Outlook
Contact Us

Scenario Applications

Fin-R1 is a large language model specifically designed for financial reasoning, featuring a lightweight 7B parameter architecture. While significantly reducing deployment costs, the model undergoes two-stage training (SFT and RL) on high-quality financial reasoning data, providing strong theoretical support, business rules, decision-making logic, and technical implementation capabilities for financial applications. This effectively enhances the model's complex financial reasoning ability, supporting core financial business scenarios such as banking, securities, insurance, and trust.

Data-Scenario

Financial Code

Financial code refers to computer programming code used in the financial field to implement various financial models, algorithms, and analysis tasks. It covers a wide range of aspects, from simple financial calculations to complex financial derivative pricing, risk assessment, and portfolio optimization, facilitating data processing, statistical analysis, numerical computation, and visualization for financial professionals.

FinancialCode

Financial Calculation

Financial calculation is the process of quantitatively analyzing and computing various financial problems. It involves building mathematical models and using numerical methods to solve real-world financial problems, providing a scientific basis for financial decision-making and helping financial institutions and investors better manage risks, optimize resource allocation, and improve investment returns.

FinancialCalculations

English Financial Calculation

English financial calculation emphasizes the construction and computation of financial models in an English-speaking cross - language environment. It enables users to write financial analysis reports in English and communicate with international peers.

EnglishFinancialCalculations

Financial Security and Compliance

Financial security and compliance focus on preventing financial crimes and meeting regulatory requirements. It helps enterprises establish a sound compliance management system, conduct regular compliance checks and audits, and ensure that business operations comply with relevant regulations.

FinancialSecurityandCompliance

Intelligent Risk Control

Intelligent risk control uses AI and big data technologies to identify and manage financial risks. Compared with traditional risk control methods, it offers higher efficiency, accuracy, and real - time performance. By deeply mining and analyzing large amounts of financial data, it can detect potential risk patterns and abnormal trading behaviors, enabling timely warnings and risk control measures.

IntelligentRiskControl

ESG Analysis

ESG analysis evaluates a company's performance in environmental (Environmental), social (Social), and governance (Governance) aspects, measuring its sustainable development ability. It ensures that investment activities not only generate financial returns but also promote sustainable development and social responsibility. Financial institutions and enterprises can improve their ESG performance to meet the higher expectations and requirements of investors and society.

ESG

Overall Workflow

We built a data distillation framework based on DeepSeek - R1 and processed the data according to the official parameters. Using a two - stage data screening method, we improved the quality of financial data and generated SFT and RL datasets. During training, we used Qwen2.5 - 7B - Instruct to train the financial reasoning large model Fin - R1 through supervised fine - tuning (SFT) and reinforcement learning (RL) to enhance the accuracy and generalization ability of financial reasoning tasks.

Overall Workflow

🛠️ Data Construction

To transfer the reasoning ability of DeepSeek - R1 to the financial scenario and address the issue of high - quality financial reasoning data, we used Deepseek - R1 (full - strength version) to perform domain knowledge distillation and screening on multiple datasets, including industry corpora (FinCorpus, Ant_Finance), professional cognition (FinPEE), business knowledge (FinCUGE, FinanceIQ, Finance - Instruct - 500K), table parsing (FinQA), market insights (TFNS), multi - round interaction (ConvFinQA), and quantitative investment (FinanceQT). We constructed a high - quality COT dataset Fin - R1 - Data of approximately 60k entries for professional financial reasoning scenarios. This dataset covers multi - dimensional professional knowledge in the Chinese and English financial vertical domains and is divided into four modules: financial code, financial professional knowledge, financial non - reasoning business knowledge, and financial reasoning business knowledge, effectively supporting multiple core financial scenarios such as banking, funds, and securities. We built a data distillation framework based on Deepseek - R1 and innovatively proposed a two - round quality scoring and screening method for the chain of thought, "answer + reasoning". In the first round, we scored the answer accuracy based on rule matching and Qwen2.5 - 72B - Instruct. In the second round, we deeply verified the reasoning logic, including logical consistency and term compliance, to ensure data quality.

Data Processing

Data Distillation

During the distillation process, we strictly followed the details provided by the official DeepSeek - R1 for data distillation operations.

Data Screening

We adopted an innovative two - round quality scoring method for the chain of thought, "answer + reasoning logic", to screen financial data. In the first round, we scored the answer accuracy based on rule matching and Qwen2.5 - 72B - Instruct. In the second round, we deeply verified the reasoning logic, including logical consistency, term compliance, etc. Each round of scoring marked the data as "good" or "bad".

We used the data marked as "good" after two - round screening as high - quality COT data for SFT, while the data marked as "bad" was used as reasoning QA data for reinforcement learning (RL).

Fin - R1 - Data Distribution

Fin - R1 - Data covers multi - dimensional professional knowledge in the Chinese and English financial vertical domains and is divided into four modules: financial code, financial professional knowledge, financial non - reasoning business knowledge, and financial reasoning business knowledge, effectively supporting multiple core financial scenarios such as banking, securities, and trust.

grpo

Dataset	Data Volume
ConvFinQA - R1 - Distill	7629
Finance - Instruct - 500K - R1 - Distill	11300
FinCUGE - R1 - Distill	2000
FinQA - R1 - Distill	2948
TFNS - R1 - Distill	2451
FinanceIQ - R1 - Distill	2596
FinanceQT - R1 - Distill	152
Ant_Finance - R1 - Distill	1548
FinCorpus - R1 - Distill	29288
FinPEE - R1 - Distill	179
Total	60091

🚀 Fine - tuning Training

Two - Stage Process

For complex financial reasoning tasks, we used Qwen2.5 - 7B - Instruct to perform two - stage fine - tuning training to obtain the financial reasoning large language model Fin - R1. First, through SFT (Supervised Fine - Tuning) on high - quality financial reasoning data, we helped the model initially improve its financial reasoning ability. Then, based on the GRPO (Group Relative Policy Optimization) algorithm, we combined format rewards and accuracy rewards for reinforcement learning to further enhance the accuracy and generalization ability of financial reasoning tasks.

First Stage - Injecting Reasoning Ability

For complex reasoning in financial reasoning tasks, we used the ConvFinQA and FinQA financial datasets to perform supervised fine - tuning on Qwen2.5 - 7B - Instruct in the first stage. After one round of fine - tuning training, we ensured that the model could deeply understand and process complex financial reasoning problems.

Second Stage - Reinforcement Learning Optimization

After the model mastered complex reasoning skills, we used the GRPO (Group Relative Policy Optimization) algorithm as the core framework, optimized the output format and accuracy of the model with a dual - reward mechanism, and introduced a model - based verifier (Model - Based Verifier). We used Qwen2.5 - Max for answer evaluation to improve the possible bias of rewards based on regular expressions, generating more accurate and reliable reward signals and enhancing the effectiveness and stability of reinforcement learning.

grpo

🚨 Model Evaluation Results

We evaluated the model on benchmark tests covering multiple financial business scenarios. In the evaluation results, the model Fin - R1 - SFT, which only underwent instruction fine - tuning (SFT), had achieved certain performance improvements in financial scenarios compared to the base model. However, there was still room for improvement compared to DeepSeek - R1. So, we conducted reinforcement learning training on the basis of Fin - R1 - SFT. The results showed that Fin - R1, trained through instruction fine - tuning (SFT) and reinforcement learning (RL), with only a lightweight 7B parameter scale, demonstrated significant performance advantages, achieving an average score of 75.2 and ranking second. It comprehensively outperformed models of the same scale and had only a 3.0 - point difference from the industry benchmark DeepSeek - R1 in the average score, and exceeded DeepSeek - R1 - Distill - Llama - 70B (69.2) by 6.0 points. In addition, Fin - R1 topped the list among the participating models in the two key task tests of FinQA, which focuses on real financial table numerical reasoning tasks, and ConvFinQA, a multi - round reasoning interaction scenario, with scores of 76.0 and 85.0 respectively, demonstrating the model's strong processing ability in financial reasoning and non - reasoning scenarios.

Model	Parameters	FinQA	ConvFinQA	Ant_Finance	TFNS	Finance - Instruct - 500k	Average
DeepSeek - R1	671B	71.0	82.0	90.0	78.0	70.0	78.2
Fin - R1	7B	76.0	85.0	81.0	71.0	62.9	75.2
Qwen - 2.5 - 32B - Instruct	32B	72.0	78.0	84.0	77.0	58.0	73.8
DeepSeek - R1 - Distill - Qwen - 32B	32B	70.0	72.0	87.0	79.0	54.0	72.4
Fin - R1 - SFT	7B	73.0	81.0	76.0	68.0	61.0	71.9
Qwen - 2.5 - 14B - Instruct	14B	68.0	77.0	84.0	72.0	56.0	71.4
DeepSeek - R1 - Distill - Llama - 70B	70B	68.0	74.0	84.0	62.0	56.0	69.2
DeepSeek - R1 - Distill - Qwen - 14B	14B	62.0	73.0	82.0	65.0	49.0	66.2
Qwen - 2.5 - 7B - Instruct	7B	60.0	66.0	85.0	68.0	49.0	65.6
DeepSeek - R1 - Distill - Qwen - 7B	7B	55.0	62.0	71.0	60.0	42.0	58.0

Declaration and Future Outlook

This project was completed by the Financial Large Language Model Research Group (SUFE - AIFLM - Lab) of the School of Statistics and Data Science at Shanghai University of Finance and Economics in collaboration with Caiyue Xingchen. As a financial reasoning large language model, Fin - R1 can excellently complete many financial tasks and provide professional services. However, at present, there are still technical bottlenecks and application limitations. The suggestions and analysis results it provides are for reference only and cannot be equivalent to the precise judgments of professional financial analysts or experts. We sincerely hope that users will critically examine the model's output and make decisions based on their own professional knowledge and experience. In the future, we will continuously optimize Fin - R1, deeply explore its application potential in cutting - edge financial scenarios, and contribute to the intelligent and compliant development of the financial industry.

📫 Contact Us

We sincerely invite industry colleagues to jointly explore the innovative paradigm of the in - depth integration of AI and finance, build a new intelligent financial ecosystem, and contact us via email at zhang.liwen@shufe.edu.cn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Fin R1 GGUF

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Fin-R1 GGUF Models

✨ Features

Choosing the Right Model Format

BF16 (Brain Float 16) – Use if BF16 acceleration is available

F16 (Float 16) – More widely supported than BF16

Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference

Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)

Summary Table: Model Format Selection

📦 Included Files & Details

Fin-R1-bf16.gguf

Fin-R1-f16.gguf

Fin-R1-bf16-q8_0.gguf

Fin-R1-f16-q8_0.gguf

Fin-R1-q4_k.gguf

Fin-R1-q4_k_s.gguf

Fin-R1-q6_k.gguf

Fin-R1-q8_0.gguf

Fin-R1-iq3_xs.gguf

Fin-R1-iq3_m.gguf

Fin-R1-q4_0.gguf

🚀 Quick Start

What I'm Testing

The other Available AI Assistants

📚 Documentation

Overview

Code

Table of Contents

Scenario Applications

Financial Code

Financial Calculation

English Financial Calculation

Financial Security and Compliance

Intelligent Risk Control

ESG Analysis

Overall Workflow

🛠️ Data Construction

Data Distillation

Data Screening

Fin - R1 - Data Distribution

🚀 Fine - tuning Training

Two - Stage Process

First Stage - Injecting Reasoning Ability

Second Stage - Reinforcement Learning Optimization

🚨 Model Evaluation Results

Declaration and Future Outlook

📫 Contact Us

`Fin-R1-bf16.gguf`

`Fin-R1-f16.gguf`

`Fin-R1-bf16-q8_0.gguf`

`Fin-R1-f16-q8_0.gguf`

`Fin-R1-q4_k.gguf`

`Fin-R1-q4_k_s.gguf`

`Fin-R1-q6_k.gguf`

`Fin-R1-q8_0.gguf`

`Fin-R1-iq3_xs.gguf`

`Fin-R1-iq3_m.gguf`

`Fin-R1-q4_0.gguf`