BitNet b1.58 3B Open-Source Large Language Model - Reduce Resource Requirements While Maintaining High Performance and Free to Use

Bitnet B1 58 Xl

Developed by 1bitLLM

BitNet b1.58 3B is a 1-bit quantized large language model trained on 100 billion tokens from the RedPajama dataset, significantly reducing computational resource requirements while maintaining performance.

Large Language Model

Transformers

Open Source License:MIT #1-bit quantized LLM #Efficient inference #Low-resource training

Downloads 10.64k

Release Time : 3/29/2024

Model Overview

This model is a reproduction implementation of the BitNet b1.58 paper, utilizing 1.58-bit quantization technology to provide an efficient language model solution.

Model Features

1-bit quantization

Utilizes 1.58-bit quantization technology, significantly reducing model storage and computational demands.

Efficient training

Optimizes the training process with two-stage learning rate adjustment and weight decay.

Performance close to full-precision models

At the 3B parameter scale, performance is close to that of FP16 full-precision models.

Model Capabilities

Text generation

Language understanding

Zero-shot learning

Use Cases

Natural Language Processing

Question answering systems

Can be used to build efficient question answering systems

Performs well on benchmarks like ARC

Text generation

Suitable for various text generation tasks

Perplexity (PPL) performance is close to full-precision models

🚀 BitNet b1.58 Reproduction Project

This project is a reproduction of the BitNet b1.58 paper, aiming to replicate and further explore the model's performance.

🚀 Quick Start

This is a reproduction of the BitNet b1.58 paper. The models are trained with the RedPajama dataset for 100B tokens. The hyperparameters, along with two - stage LR and weight decay, are implemented as suggested in their paper. All models are open - source in the repo. We will train larger models and/or with more tokens when resources are available.

✨ Features

Reproduce the BitNet b1.58 paper.
Train models using the RedPajama dataset.
Implement hyperparameters as suggested in the paper.
Provide open - source models in the Hugging Face repo.

📦 Installation

The evaluation pipelines are from the paper authors. Here is the command to install the necessary package:

pip install lm-eval==0.3.0

💻 Usage Examples

Basic Usage

To evaluate the perplexity (PPL):

python eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048

Advanced Usage

To evaluate tasks:

python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
    --batch_size 1 \
    --tasks \
    --output_path result.json \
    --num_fewshot 0 \
    --ctx_size 2048

📚 Documentation

Results

PPL and zero - shot accuracy:

Models	PPL	ARCe	ARCc	HS	BQ	OQ	PQ	WGe	Avg
FP16 700M (reported)	12.33	54.7	23.0	37.0	60.0	20.2	68.9	54.8	45.5
BitNet b1.58 700M (reported)	12.87	51.8	21.4	35.1	58.2	20.0	68.1	55.2	44.3
BitNet b1.58 700M (reproduced)	12.78	51.4	21.8	35.0	59.6	20.6	67.5	55.4	44.5
FP16 1.3B (reported)	11.25	56.9	23.5	38.5	59.1	21.6	70.0	53.9	46.2
BitNet b1.58 1.3B (reported)	11.29	54.9	24.2	37.7	56.7	19.6	68.8	55.8	45.4
BitNet b1.58 1.3B (reproduced)	11.19	55.8	23.7	37.6	59.0	20.2	69.2	56.0	45.9
FP16 3B (reported)	10.04	62.1	25.6	43.3	61.8	24.6	72.1	58.2	49.7
BitNet b1.58 3B (reported)	9.91	61.4	28.3	42.9	61.5	26.6	71.5	59.3	50.2
BitNet b1.58 3B (reproduced)	9.88	60.9	28.0	42.3	58.3	26.0	71.4	60.3	49.6

The differences between the reported numbers and the reproduced results are possibly variances from the training data processing, seeds, or other random factors.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご