Llama3-8B-1.58-100B-tokens開源大語言模型 - 支持超長對話，免費便捷使用！

首頁

Llama3 8B 1.58 100B Tokens

由HF1BitLLM開發

基於BitNet 1.58b架構微調的大型語言模型，基礎模型為Llama-3-8B-Instruct，採用極端量化技術

大型語言模型

Transformers

#1.58比特量化 #高效微調 #教育領域優化

下載量 2,427

發布時間 : 9/10/2024

模型概述

Llama3-8B-1.58是一個採用1.58比特量化的高效大型語言模型，通過1000億token訓練優化，在保持性能的同時顯著降低計算資源需求

模型特點

極端量化技術

採用1.58比特量化架構，顯著降低模型存儲和計算需求

大規模訓練

經過1000億token的擴展訓練，性能接近半精度模型

高效推理

在保持良好性能的同時減少資源消耗

模型能力

文本生成

問答系統

邏輯推理

使用案例

教育

推理問答

解決多步推理問題，如跟蹤人物位置變化

能夠正確回答涉及多步位置變化的推理問題

研究

量化技術研究

探索極端量化條件下LLM的性能邊界

性能接近半精度模型

🚀 Llama3-8B-1.58模型

Llama3-8B-1.58 模型是基於 BitNet 1.58b架構 微調的大語言模型，其基礎模型為 Llama-3-8B-Instruct。若想深入瞭解相關方法和結果，請查看我們的博客文章。

🚀 快速開始

你可以在 Transformers 庫中輕鬆加載並測試我們的模型。只需按照以下代碼操作：

首先，安裝帶有正確配置的 transformers 版本，以加載 BitNet 模型：

pip install git+https://github.com/huggingface/transformers.git@refs/pull/33410/head

然後，加載模型：

model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)    
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

input_text = "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:"

input_ids = tokenizer.encode(input_text, return_tensors="pt").cuda()
output = model.generate(input_ids, max_length=10, do_sample=False)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

✨ 主要特性

模型詳情

模型來源

倉庫地址：模型
論文地址：The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

訓練詳情

訓練數據

該模型在 FineWeb-edu 的一個子集上進行訓練。

訓練過程

起始點：使用線性 lambda 調度器在 100 億 token 運行中表現最佳的檢查點。
訓練時長：額外微調 45,000 步，總共達到 100 億 token。
數據集：FineWeb-edu 數據集。
批次大小：每步 200 萬 token，每次運行總計 45,000 步 * 200 萬 token = 90 億 token，加上初始的 10 億 token 達到 100 億。
學習率實驗：測試了各種學習率以找到最佳設置，根據實驗，表現最佳的峰值學習率為 1e - 5。
性能表現：在某些指標上接近 Llama3 8B，但整體平均性能略遜於 Llama3 8B。
評估指標：包括困惑度、MMLU 分數和其他標準基準。

這些在 100 億 token 上的擴展訓練運行突破了高度量化模型的界限，使性能更接近 Llama3 等半精度模型。

評估

模型在 Nanotron 檢查點上使用 LightEval 進行評估：

📄 許可證

引用信息

@misc{,
      title={1.58-Bit LLM: A New Era of Extreme Quantization}, 
      author={Mohamed Mekkouri and Marc Sun and Leandro von Werra and Thomas Wolf},
      year={2024},
}