Model Overview
Model Features
Model Capabilities
Use Cases
๐ Fin-R1 GGUF Models
These models are designed for financial inference, offering various formats to suit different hardware and memory requirements.
โจ Features
Choosing the Right Model Format
Selecting the appropriate model format depends on your hardware capabilities and memory constraints.
BF16 (Brain Float 16) โ Use if BF16 acceleration is available
- A 16-bit floating-point format designed for faster computation while retaining good precision.
- Provides similar dynamic range as FP32 but with lower memory usage.
- Recommended if your hardware supports BF16 acceleration (check your deviceโs specs).
- Ideal for high-performance inference with reduced memory footprint compared to FP32.
๐ Use BF16 if: โ Your hardware has native BF16 support (e.g., newer GPUs, TPUs). โ You want higher precision while saving memory. โ You plan to requantize the model into another format.
๐ Avoid BF16 if: โ Your hardware does not support BF16 (it may fall back to FP32 and run slower). โ You need compatibility with older devices that lack BF16 optimization.
F16 (Float 16) โ More widely supported than BF16
- A 16-bit floating-point format with high precision but a smaller range of values than BF16.
- Works on most devices with FP16 acceleration support (including many GPUs and some CPUs).
- Slightly lower numerical precision than BF16 but generally sufficient for inference.
๐ Use F16 if: โ Your hardware supports FP16 but not BF16. โ You need a balance between speed, memory usage, and accuracy. โ You are running on a GPU or another device optimized for FP16 computations.
๐ Avoid F16 if: โ Your device lacks native FP16 support (it may run slower than expected). โ You have memory limitations.
Quantized Models (Q4_K, Q6_K, Q8, etc.) โ For CPU & Low-VRAM Inference
Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
- Lower-bit models (Q4_K) โ Best for minimal memory usage, may have lower precision.
- Higher-bit models (Q6_K, Q8_0) โ Better accuracy, requires more memory.
๐ Use Quantized Models if: โ You are running inference on a CPU and need an optimized model. โ Your device has low VRAM and cannot load full-precision models. โ You want to reduce memory footprint while keeping reasonable accuracy.
๐ Avoid Quantized Models if: โ You need maximum accuracy (full-precision models are better for this). โ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)
These models are optimized for extreme memory efficiency, making them ideal for low-power devices or large-scale deployments where memory is a critical constraint.
-
IQ3_XS: Ultra-low-bit quantization (3-bit) with extreme memory efficiency.
- Use case: Best for ultra-low-memory devices where even Q4_K is too large.
- Trade-off: Lower accuracy compared to higher-bit quantizations.
-
IQ3_S: Small block size for maximum memory efficiency.
- Use case: Best for low-memory devices where IQ3_XS is too aggressive.
-
IQ3_M: Medium block size for better accuracy than IQ3_S.
- Use case: Suitable for low-memory devices where IQ3_S is too limiting.
-
Q4_K: 4-bit quantization with block-wise optimization for better accuracy.
- Use case: Best for low-memory devices where Q6_K is too large.
-
Q4_0: Pure 4-bit quantization, optimized for ARM devices.
- Use case: Best for ARM-based devices or low-memory environments.
Summary Table: Model Format Selection
Property | Details |
---|---|
Model Type | BF16, F16, Q4_K, Q6_K, Q8_0, IQ3_XS, Q4_0 |
Precision | Varies from very low (IQ3_XS) to high (BF16, F16, Q8_0) |
Memory Usage | Ranges from very low (IQ3_XS) to high (BF16, F16) |
Device Requirements | BF16-supported GPU/CPUs, FP16-supported devices, CPU or Low-VRAM devices, etc. |
Best Use Case | High-speed inference with reduced memory, GPU inference when BF16 isnโt available, memory-constrained environments, etc. |
๐ฆ Included Files & Details
Fin-R1-bf16.gguf
- Model weights preserved in BF16.
- Use this if you want to requantize the model into a different format.
- Best if your device supports BF16 acceleration.
Fin-R1-f16.gguf
- Model weights stored in F16.
- Use if your device supports FP16, especially if BF16 is not available.
Fin-R1-bf16-q8_0.gguf
- Output & embeddings remain in BF16.
- All other layers quantized to Q8_0.
- Use if your device supports BF16 and you want a quantized version.
Fin-R1-f16-q8_0.gguf
- Output & embeddings remain in F16.
- All other layers quantized to Q8_0.
Fin-R1-q4_k.gguf
- Output & embeddings quantized to Q8_0.
- All other layers quantized to Q4_K.
- Good for CPU inference with limited memory.
Fin-R1-q4_k_s.gguf
- Smallest Q4_K variant, using less memory at the cost of accuracy.
- Best for very low-memory setups.
Fin-R1-q6_k.gguf
- Output & embeddings quantized to Q8_0.
- All other layers quantized to Q6_K.
Fin-R1-q8_0.gguf
- Fully Q8 quantized model for better accuracy.
- Requires more memory but offers higher precision.
Fin-R1-iq3_xs.gguf
- IQ3_XS quantization, optimized for extreme memory efficiency.
- Best for ultra-low-memory devices.
Fin-R1-iq3_m.gguf
- IQ3_M quantization, offering a medium block size for better accuracy.
- Suitable for low-memory devices.
Fin-R1-q4_0.gguf
- Pure Q4_0 quantization, optimized for ARM devices.
- Best for low-memory environments.
- Prefer IQ4_NL for better accuracy.
๐ Quick Start
If you find these models useful, please click like โค. Also, Iโd really appreciate it if you could test my Network Monitor Assistant at ๐ Network Monitor Assitant.
๐ฌ Click the chat icon (bottom right of the main and dashboard pages). Choose a LLM; toggle between the LLM Types TurboLLM -> FreeLLM -> TestLLM.
What I'm Testing
I'm experimenting with function calling against my network monitoring service. Using small open source models. I am into the question "How small can it go and still function".
๐ก TestLLM โ Runs the current testing model using llama.cpp on 6 threads of a Cpu VM (Should take about 15s to load. Inference speed is quite slow and it only processes one user prompt at a timeโstill working on scaling!). If you're curious, I'd be happy to share how it works!
The other Available AI Assistants
๐ข TurboLLM โ Uses gpt-4o-mini Fast! Note: tokens are limited since OpenAI models are pricey, but you can Login or Download the Free Network Monitor agent to get more tokens, Alternatively use the TestLLM.
๐ต HugLLM โ Runs open-source Hugging Face models Fast, Runs small models (โ8B) hence lower quality, Get 2x more tokens (subject to Hugging Face API availability)
๐ Documentation
Overview
Fin-R1 is a large language model designed for complex financial reasoning, jointly developed and open-sourced by the Financial Large Language Model Research Group (SUFE-AIFLM-Lab) of the School of Statistics and Data Science at Shanghai University of Finance and Economics and Caiyue Xingchen. Based on Qwen2.5-7B-Instruct, the model is fine-tuned on high-quality verifiable financial questions, achieving SOTA performance on multiple financial benchmarks.
Code
Table of Contents
Scenario Applications
Fin-R1 is a large language model specifically designed for financial reasoning, featuring a lightweight 7B parameter architecture. While significantly reducing deployment costs, the model undergoes two-stage training (SFT and RL) on high-quality financial reasoning data, providing strong theoretical support, business rules, decision-making logic, and technical implementation capabilities for financial applications. This effectively enhances the model's complex financial reasoning ability, supporting core financial business scenarios such as banking, securities, insurance, and trust.
Financial Code
Financial code refers to computer programming code used in the financial field to implement various financial models, algorithms, and analysis tasks. It covers a wide range of aspects, from simple financial calculations to complex financial derivative pricing, risk assessment, and portfolio optimization, facilitating data processing, statistical analysis, numerical computation, and visualization for financial professionals.
Financial Calculation
Financial calculation is the process of quantitatively analyzing and computing various financial problems. It involves building mathematical models and using numerical methods to solve real-world financial problems, providing a scientific basis for financial decision-making and helping financial institutions and investors better manage risks, optimize resource allocation, and improve investment returns.
English Financial Calculation
English financial calculation emphasizes the construction and computation of financial models in an English-speaking cross - language environment. It enables users to write financial analysis reports in English and communicate with international peers.
Financial Security and Compliance
Financial security and compliance focus on preventing financial crimes and meeting regulatory requirements. It helps enterprises establish a sound compliance management system, conduct regular compliance checks and audits, and ensure that business operations comply with relevant regulations.
Intelligent Risk Control
Intelligent risk control uses AI and big data technologies to identify and manage financial risks. Compared with traditional risk control methods, it offers higher efficiency, accuracy, and real - time performance. By deeply mining and analyzing large amounts of financial data, it can detect potential risk patterns and abnormal trading behaviors, enabling timely warnings and risk control measures.
ESG Analysis
ESG analysis evaluates a company's performance in environmental (Environmental), social (Social), and governance (Governance) aspects, measuring its sustainable development ability. It ensures that investment activities not only generate financial returns but also promote sustainable development and social responsibility. Financial institutions and enterprises can improve their ESG performance to meet the higher expectations and requirements of investors and society.
Overall Workflow
We built a data distillation framework based on DeepSeek - R1 and processed the data according to the official parameters. Using a two - stage data screening method, we improved the quality of financial data and generated SFT and RL datasets. During training, we used Qwen2.5 - 7B - Instruct to train the financial reasoning large model Fin - R1 through supervised fine - tuning (SFT) and reinforcement learning (RL) to enhance the accuracy and generalization ability of financial reasoning tasks.
๐ ๏ธ Data Construction
To transfer the reasoning ability of DeepSeek - R1 to the financial scenario and address the issue of high - quality financial reasoning data, we used Deepseek - R1 (full - strength version) to perform domain knowledge distillation and screening on multiple datasets, including industry corpora (FinCorpus, Ant_Finance), professional cognition (FinPEE), business knowledge (FinCUGE, FinanceIQ, Finance - Instruct - 500K), table parsing (FinQA), market insights (TFNS), multi - round interaction (ConvFinQA), and quantitative investment (FinanceQT). We constructed a high - quality COT dataset Fin - R1 - Data of approximately 60k entries for professional financial reasoning scenarios. This dataset covers multi - dimensional professional knowledge in the Chinese and English financial vertical domains and is divided into four modules: financial code, financial professional knowledge, financial non - reasoning business knowledge, and financial reasoning business knowledge, effectively supporting multiple core financial scenarios such as banking, funds, and securities. We built a data distillation framework based on Deepseek - R1 and innovatively proposed a two - round quality scoring and screening method for the chain of thought, "answer + reasoning". In the first round, we scored the answer accuracy based on rule matching and Qwen2.5 - 72B - Instruct. In the second round, we deeply verified the reasoning logic, including logical consistency and term compliance, to ensure data quality.
Data Distillation
During the distillation process, we strictly followed the details provided by the official DeepSeek - R1 for data distillation operations.
Data Screening
We adopted an innovative two - round quality scoring method for the chain of thought, "answer + reasoning logic", to screen financial data. In the first round, we scored the answer accuracy based on rule matching and Qwen2.5 - 72B - Instruct. In the second round, we deeply verified the reasoning logic, including logical consistency, term compliance, etc. Each round of scoring marked the data as "good" or "bad".
We used the data marked as "good" after two - round screening as high - quality COT data for SFT, while the data marked as "bad" was used as reasoning QA data for reinforcement learning (RL).
Fin - R1 - Data Distribution
Fin - R1 - Data covers multi - dimensional professional knowledge in the Chinese and English financial vertical domains and is divided into four modules: financial code, financial professional knowledge, financial non - reasoning business knowledge, and financial reasoning business knowledge, effectively supporting multiple core financial scenarios such as banking, securities, and trust.
Dataset | Data Volume |
---|---|
ConvFinQA - R1 - Distill | 7629 |
Finance - Instruct - 500K - R1 - Distill | 11300 |
FinCUGE - R1 - Distill | 2000 |
FinQA - R1 - Distill | 2948 |
TFNS - R1 - Distill | 2451 |
FinanceIQ - R1 - Distill | 2596 |
FinanceQT - R1 - Distill | 152 |
Ant_Finance - R1 - Distill | 1548 |
FinCorpus - R1 - Distill | 29288 |
FinPEE - R1 - Distill | 179 |
Total | 60091 |
๐ Fine - tuning Training
Two - Stage Process
For complex financial reasoning tasks, we used Qwen2.5 - 7B - Instruct to perform two - stage fine - tuning training to obtain the financial reasoning large language model Fin - R1. First, through SFT (Supervised Fine - Tuning) on high - quality financial reasoning data, we helped the model initially improve its financial reasoning ability. Then, based on the GRPO (Group Relative Policy Optimization) algorithm, we combined format rewards and accuracy rewards for reinforcement learning to further enhance the accuracy and generalization ability of financial reasoning tasks.
First Stage - Injecting Reasoning Ability
For complex reasoning in financial reasoning tasks, we used the ConvFinQA and FinQA financial datasets to perform supervised fine - tuning on Qwen2.5 - 7B - Instruct in the first stage. After one round of fine - tuning training, we ensured that the model could deeply understand and process complex financial reasoning problems.
Second Stage - Reinforcement Learning Optimization
After the model mastered complex reasoning skills, we used the GRPO (Group Relative Policy Optimization) algorithm as the core framework, optimized the output format and accuracy of the model with a dual - reward mechanism, and introduced a model - based verifier (Model - Based Verifier). We used Qwen2.5 - Max for answer evaluation to improve the possible bias of rewards based on regular expressions, generating more accurate and reliable reward signals and enhancing the effectiveness and stability of reinforcement learning.
๐จ Model Evaluation Results
We evaluated the model on benchmark tests covering multiple financial business scenarios. In the evaluation results, the model Fin - R1 - SFT, which only underwent instruction fine - tuning (SFT), had achieved certain performance improvements in financial scenarios compared to the base model. However, there was still room for improvement compared to DeepSeek - R1. So, we conducted reinforcement learning training on the basis of Fin - R1 - SFT. The results showed that Fin - R1, trained through instruction fine - tuning (SFT) and reinforcement learning (RL), with only a lightweight 7B parameter scale, demonstrated significant performance advantages, achieving an average score of 75.2 and ranking second. It comprehensively outperformed models of the same scale and had only a 3.0 - point difference from the industry benchmark DeepSeek - R1 in the average score, and exceeded DeepSeek - R1 - Distill - Llama - 70B (69.2) by 6.0 points. In addition, Fin - R1 topped the list among the participating models in the two key task tests of FinQA, which focuses on real financial table numerical reasoning tasks, and ConvFinQA, a multi - round reasoning interaction scenario, with scores of 76.0 and 85.0 respectively, demonstrating the model's strong processing ability in financial reasoning and non - reasoning scenarios.
Model | Parameters | FinQA | ConvFinQA | Ant_Finance | TFNS | Finance - Instruct - 500k | Average |
---|---|---|---|---|---|---|---|
DeepSeek - R1 | 671B | 71.0 | 82.0 | 90.0 | 78.0 | 70.0 | 78.2 |
Fin - R1 | 7B | 76.0 | 85.0 | 81.0 | 71.0 | 62.9 | 75.2 |
Qwen - 2.5 - 32B - Instruct | 32B | 72.0 | 78.0 | 84.0 | 77.0 | 58.0 | 73.8 |
DeepSeek - R1 - Distill - Qwen - 32B | 32B | 70.0 | 72.0 | 87.0 | 79.0 | 54.0 | 72.4 |
Fin - R1 - SFT | 7B | 73.0 | 81.0 | 76.0 | 68.0 | 61.0 | 71.9 |
Qwen - 2.5 - 14B - Instruct | 14B | 68.0 | 77.0 | 84.0 | 72.0 | 56.0 | 71.4 |
DeepSeek - R1 - Distill - Llama - 70B | 70B | 68.0 | 74.0 | 84.0 | 62.0 | 56.0 | 69.2 |
DeepSeek - R1 - Distill - Qwen - 14B | 14B | 62.0 | 73.0 | 82.0 | 65.0 | 49.0 | 66.2 |
Qwen - 2.5 - 7B - Instruct | 7B | 60.0 | 66.0 | 85.0 | 68.0 | 49.0 | 65.6 |
DeepSeek - R1 - Distill - Qwen - 7B | 7B | 55.0 | 62.0 | 71.0 | 60.0 | 42.0 | 58.0 |
Declaration and Future Outlook
This project was completed by the Financial Large Language Model Research Group (SUFE - AIFLM - Lab) of the School of Statistics and Data Science at Shanghai University of Finance and Economics in collaboration with Caiyue Xingchen. As a financial reasoning large language model, Fin - R1 can excellently complete many financial tasks and provide professional services. However, at present, there are still technical bottlenecks and application limitations. The suggestions and analysis results it provides are for reference only and cannot be equivalent to the precise judgments of professional financial analysts or experts. We sincerely hope that users will critically examine the model's output and make decisions based on their own professional knowledge and experience. In the future, we will continuously optimize Fin - R1, deeply explore its application potential in cutting - edge financial scenarios, and contribute to the intelligent and compliant development of the financial industry.
๐ซ Contact Us
We sincerely invite industry colleagues to jointly explore the innovative paradigm of the in - depth integration of AI and finance, build a new intelligent financial ecosystem, and contact us via email at zhang.liwen@shufe.edu.cn.

