FANformer-1B开源语言模型 - 增强语言建模能力，海量参数与训练量加持

首页

Fanformer 1B

由 dongyh 开发

FANformer-1B是通过创新周期性机制增强语言建模能力的自回归模型，具有11亿非嵌入参数，训练token量达1万亿。

大型语言模型

Transformers

英语开源协议:MIT #周期性建模增强 #高效语言生成 #长序列处理

下载量 114

发布时间 : 3/20/2025

模型简介

增强周期性建模的解码器架构大语言模型，适用于通用文本生成与理解任务。

模型特点

周期性建模增强

通过FAN层创新组件有效捕捉数据中的周期性模式，提升学习效率和性能表现

高效训练

在1万亿token训练量下实现优于同类模型的性能表现

轻量化设计

11亿参数规模在保持性能的同时降低计算资源需求

模型能力

文本生成

语言理解

知识问答

逻辑推理

使用案例

文本生成

学术写作辅助

生成包含周期性概念的科学论述文本

生成连贯性达72.45%的学术风格文本（基于arc_easy测试）

教育应用

科学问答系统

回答STEM领域基础问题

sciq测试集准确率达94.8%

🚀 FANformer-1B模型

FANformer-1B是一个拥有11亿参数的自回归语言模型，它通过有效的周期性机制来增强语言建模能力。该模型可用于通用文本生成和理解，还能针对特定任务进行微调。

🚀 快速开始

你可以按照以下代码示例进行推理：

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)

input_text = "The concept of periodicity serves as a fundamental organizing principle across the natural world, human societies, and even abstract systems. From the rhythmic cycles of celestial bodies governing seasons and tides to the biological clocks regulating sleep and metabolism in living organisms, recurring patterns create stability amid chaos. In ecosystems, predator-prey population oscillations maintain balance, while the carbon cycle ensures Earth's climate resilience. Culturally, humanity has structured civilizations around agricultural cycles, religious calendars, and economic fluctuations—harvest festivals marking seasonal abundance, financial markets swaying between boom and bust. Even at the quantum level, wave functions reveal inherent periodicity that underpins material reality. This universal recurrence enables prediction, adaptation, and innovation: by recognizing cycles, we"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, do_sample=True, temperature=0.6, top_p=0.8)

print(tokenizer.decode(outputs[0]))

✨ 主要特性

有效周期性建模：通过有效的周期性机制增强语言建模能力。
新型架构：引入FAN层，能够捕捉训练数据中的周期性模式，提高学习效率和性能。
多功能用途：可用于通用文本生成和理解，还能针对特定任务进行微调。

📦 安装指南

文档未提供安装步骤，可参考transformers库的官方安装指南进行安装。

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("dongyh/FANformer-1B", trust_remote_code=True)

input_text = "The concept of periodicity serves as a fundamental organizing principle across the natural world, human societies, and even abstract systems. From the rhythmic cycles of celestial bodies governing seasons and tides to the biological clocks regulating sleep and metabolism in living organisms, recurring patterns create stability amid chaos. In ecosystems, predator-prey population oscillations maintain balance, while the carbon cycle ensures Earth's climate resilience. Culturally, humanity has structured civilizations around agricultural cycles, religious calendars, and economic fluctuations—harvest festivals marking seasonal abundance, financial markets swaying between boom and bust. Even at the quantum level, wave functions reveal inherent periodicity that underpins material reality. This universal recurrence enables prediction, adaptation, and innovation: by recognizing cycles, we"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=512, do_sample=True, temperature=0.6, top_p=0.8)

print(tokenizer.decode(outputs[0]))

📚 详细文档

模型描述

属性	详情
模型名称	FANformer-1B
非嵌入参数	11亿
训练令牌	1万亿
发布日期	2025年3月
模型类型	仅解码器的大语言模型，具有增强的周期性建模能力
许可证	MIT许可证
仓库	GitHub
论文	arXiv:2502.21309

训练详情

属性	详情
硬件	80个A100 40G GPU
训练数据	Dolma数据集的子集（OLMo-1B的训练语料库）
最大上下文长度	2048个令牌

预期用途

主要用途：通用文本生成和理解。
下游用途：可针对总结、问答和对话等任务进行微调。
局限性：可能继承训练数据中的偏差，对低资源语言的性能无法保证。

评估

标准基准测试	Llama-3.2-1B	TinyLLaMA-v1.1 (3T)	MobiLLaMA-1B (1.3T)	OLMo-1B (2T)	OpenELM-1_1B (1.8T)	OLMo-1B-0724 (3T)	AMD-OLMo-1B (1.3T)	FANformer-1B (1T)
arc_easy	56.84	55.47	56.65	57.28	55.43	56.65	63.64	72.456
arc_challenge	38.13	32.68	32.00	31.06	32.34	32.34	33.70	43.813
hellaswag	64.00	61.47	61.80	62.92	64.81	66.12	63.61	64.758
piqa	73.80	73.56	75.30	75.14	75.57	75.08	75.57	75.547
boolq	64.30	55.99	60.83	61.74	63.58	66.18	60.58	64.924
sciq	92.30	89.30	88.20	87.00	90.60	92.70	93.20	94.80
winogrande	61.20	59.43	59.27	59.98	61.72	61.72	61.64	61.80
openbookqa	46.00	36.80	35.40	36.20	36.20	35.60	35.80	48.20
gsm8k	6.83	1.82	0.00	2.50	2.81	8.95	2.88	15.74
Average	55.93	51.84	52.16	52.65	53.67	55.04	54.51	60.23

🔧 技术细节

FANformer-1B的修订架构（olmo/model.py）引入了FAN层，这是一个新型组件，旨在捕捉训练数据中的周期性模式，从而提高学习效率和性能。

📄 许可证

本项目采用MIT许可证。

📖 引用

@article{dong2025fanformer,
  title={FANformer: Improving Large Language Models Through Effective Periodicity Modeling},
  author={Dong, Yihong and Li, Ge and Jiang, Xue and Tao, Yongding and Zhang, Kechi and Zhu, Hao and Liu, Huanyu and Ding, Jiazheng and Li, Jia and Deng, Jinliang and Mei, Hong},
  journal={arXiv preprint arXiv:2502.21309},
  year={2025}
}

@article{dong2024fan,
  title={FAN: Fourier Analysis Networks},
  author={Yihong Dong and Ge Li and Yongding Tao and Xue Jiang and Kechi Zhang and Jia Li and Jing Su and Jun Zhang and Jingjing Xu},
  journal={arXiv preprint arXiv:2410.02675},
  year={2024}
}