Bitnet_b1_58-large Open Source Large Language Model - Free to use, handle massive data and output high-quality content

Bitnet B1 58 Large

Developed by 1bitLLM

BitNet b1.58 is a 1-bit large language model with 3 billion parameters, trained on the RedPajama dataset for 100 billion tokens.

Large Language Model

Transformers

Open Source License:MIT #1.58-bit quantization #Efficient inference #Language model

Downloads 10.17k

Release Time : 3/29/2024

Model Overview

This model is a 1-bit quantized large language model designed to provide efficient inference performance while maintaining accuracy comparable to traditional floating-point models.

Model Features

1-bit quantization

Model weights and activations are represented using only 1 bit, significantly reducing memory usage and computational requirements.

Efficient inference

Compared to traditional floating-point models, 1-bit quantization significantly improves inference efficiency.

Performance retention

Achieves quantization while maintaining model performance close to full-precision models.

Two-phase training

Trained using the two-phase learning rate and weight decay strategy suggested in the paper.

Model Capabilities

Text generation

Language understanding

Zero-shot learning

Use Cases

Natural Language Processing

Question answering systems

Can be used to build efficient question answering systems

Performs well on benchmarks like ARC

Text generation

Can be used for various text generation tasks

Perplexity metrics are close to full-precision models

🚀 BitNet b1.58 Paper Reproduction

This project is a reproduction of the BitNet b1.58 paper. The models are trained on the RedPajama dataset with 100B tokens. The hyperparameters, along with the two - stage learning rate and weight decay, are implemented as suggested in their paper. All models are open - source and available in the repo. We plan to train larger models or use more tokens when resources permit.

🚀 Quick Start

Installation

pip install lm-eval==0.3.0

Evaluation

python eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048

python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
    --batch_size 1 \
    --tasks \
    --output_path result.json \
    --num_fewshot 0 \
    --ctx_size 2048

✨ Features

Reproduce the BitNet b1.58 paper with specific training settings.
Use the RedPajama dataset for model training.
Implement hyperparameters as suggested in the referenced paper.
Provide open - source models in the Hugging Face repo.

📦 Installation

pip install lm-eval==0.3.0

💻 Usage Examples

Evaluation

python eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048

python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
    --batch_size 1 \
    --tasks \
    --output_path result.json \
    --num_fewshot 0 \
    --ctx_size 2048

📚 Documentation

Results

PPL and zero - shot accuracy:

Models	PPL	ARCe	ARCc	HS	BQ	OQ	PQ	WGe	Avg
FP16 700M (reported)	12.33	54.7	23.0	37.0	60.0	20.2	68.9	54.8	45.5
BitNet b1.58 700M (reported)	12.87	51.8	21.4	35.1	58.2	20.0	68.1	55.2	44.3
BitNet b1.58 700M (reproduced)	12.78	51.4	21.8	35.0	59.6	20.6	67.5	55.4	44.5
FP16 1.3B (reported)	11.25	56.9	23.5	38.5	59.1	21.6	70.0	53.9	46.2
BitNet b1.58 1.3B (reported)	11.29	54.9	24.2	37.7	56.7	19.6	68.8	55.8	45.4
BitNet b1.58 1.3B (reproduced)	11.19	55.8	23.7	37.6	59.0	20.2	69.2	56.0	45.9
FP16 3B (reported)	10.04	62.1	25.6	43.3	61.8	24.6	72.1	58.2	49.7
BitNet b1.58 3B (reported)	9.91	61.4	28.3	42.9	61.5	26.6	71.5	59.3	50.2
BitNet b1.58 3B (reproduced)	9.88	60.9	28.0	42.3	58.3	26.0	71.4	60.3	49.6

The differences between the reported numbers and the reproduced results may be due to variances in training data processing, seeds, or other random factors.

Evaluation

The evaluation pipelines are from the paper authors. The commands to run the evaluation are provided above.

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご