DeciLM-7B Open-Source Text Generation Model - Leading in the Rankings, Enjoy Free and Efficient Content Creation

Decilm 7B

Developed by Deci

DeciLM-7B is a pure decoder-based text generation model with 7.04 billion parameters, released under the Apache 2.0 license. It is currently the best-performing 7B base language model on the Open LLM Leaderboard.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Efficient Text Generation #GQA Attention Mechanism #8K Long Sequence Support

Downloads 2,833

Release Time : 12/10/2023

Model Overview

DeciLM-7B is an efficient pre-trained text generation model that supports sequences up to 8K tokens in length. It employs a variable grouped-query attention (GQA) mechanism, achieving an excellent balance between accuracy and computational efficiency.

Model Features

High Efficiency

Throughput is 4.4 times that of Mistral-7B, excelling among models in its class.

Long Sequence Support

Supports sequences up to 8K tokens, ideal for processing long texts.

Variable Grouped-Query Attention

Utilizes a variable grouped-query attention (GQA) mechanism to optimize computational efficiency.

AutoNAC Technology

Model architecture is generated by Deci's proprietary neural architecture search technology, AutoNAC.

Model Capabilities

Text Generation

Language Understanding

Long Text Processing

Use Cases

Commercial Applications

Content Generation

Generate marketing copy, product descriptions, and other business content.

Customer Support

Used in automated customer support systems to generate responses and recommendations.

Research Applications

Language Model Research

Used for performance research and optimization of language models.

Text Analysis

Used for text analysis and comprehension tasks.

🚀 DeciLM-7B

DeciLM-7B is a text generation model with 7.04 billion parameters, designed as a decoder-only architecture. It is released under the Apache 2.0 license. At the time of its release, DeciLM-7B stands as the top-performing 7B base language model on the Open LLM Leaderboard. Supporting an 8K-token sequence length, this highly efficient model utilizes variable Grouped-Query Attention (GQA) to strike an excellent balance between accuracy and computational efficiency. The model's architecture is generated by Deci's proprietary Neural Architecture Search technology, AutoNAC.

✨ Features

High Performance: At the time of release, it's the top-performing 7B base language model on the Open LLM Leaderboard.
Efficient Architecture: Uses variable Grouped-Query Attention (GQA) and supports an 8K-token sequence length.
Optimized Design: The architecture is generated by Deci's AutoNAC technology.

📦 Installation

The README doesn't provide specific installation commands, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Deci/DeciLM-7B"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", trust_remote_code=True).to(device)

inputs = tokenizer.encode("In a shocking finding, scientists discovered a herd of unicorns living in", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=100, do_sample=True, top_p=0.95)
print(tokenizer.decode(outputs[0]))

# The model can also be used via the text-generation pipeline interface
from transformers import pipeline
generator = pipeline("text-generation", "Deci/DeciLM-7B", torch_dtype="auto", trust_remote_code=True, device=device)
outputs = generator("In a shocking finding, scientists discovered a herd of unicorns living in", max_new_tokens=100, do_sample=True, top_p=0.95)
print(outputs[0]["generated_text"])

📚 Documentation

Model Details

Model Description

Deci developed and released the DeciLM-7B language model, a pre-trained, high-efficiency text generation model with 7 billion parameters. DeciLM-7B is not only the most accurate 7B base model, but it also outpaces all models in its class with a throughput that is up to 4.4x that of Mistral-7B's. An instruct version DeciLM-7B-instruct has also been released.

Property	Details
Developed by	Deci
Model Type	DeciLM is an auto-regressive language model using an optimized transformer decoder architecture that includes variable Grouped-Query Attention.
Language(s) (NLP)	English
License	Apache 2.0

Model Architecture

Parameters	Layers	Heads	Sequence Length	GQA num_key_value_heads*
7.04 billion	32	32	8192	Variable

*AutoNAC was employed to optimize the selection of the GQA num_key_value_heads for each layer.

Model Sources

Blog: DeciLM-7B Technical Blog
Demo: DeciLM-7B-instruct Demo
Finetuning Notebook: DeciLM-7B Finetuning Notebook
Text Generation Notebook: DeciLM-7B-instruct Text Generation Notebook

Uses

The model is intended for commercial and research use in English and can be fine-tuned for various tasks and languages.

Evaluation

Evaluation Results

Below are DeciLM-7B and DeciLM-7B-instruct's Open LLM Leaderboard results.

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
DecilLM-7B	61.55	59.39	82.51	59.76	40.33	79.95	47.38
DecilLM-7B-instruct	63.19	61.01	82.37	60.24	49.75	79.72	46.02

Runtime Benchmarks

Inference Tool	Hardware	Prompt length	Generation length	Generated tokens/sec	Batch Size	Number of Prompts
HuggingFace (PyTorch)	A100 (SXM4-80GB-400W)	512	512	1174	352	352
HuggingFace (PyTorch)	A100 (SXM4-80GB-400W)	2048	2048	328	72	72
Infery-LLM	A100 (SXM4-80GB-400W)	512	512	4559	1024	4096
Infery-LLM	A100 (SXM4-80GB-400W)	2048	2048	3997	512	2048
Infery-LLM	A10	512	512	1345	128	512
Infery-LLM	A10	2048	2048	599	32	128

To replicate the results of the Hugging Face benchmarks, you can use this code example.
Infery-LLM, Deci's inference engine, features a suite of optimization algorithms, including selective quantization, optimized beam search, continuous batching, and custom CUDA kernels. To explore the capabilities of Infery-LLM, schedule a live demo.

Ethical Considerations and Limitations

DeciLM-7B is a new technology that comes with inherent risks associated with its use. The testing conducted so far has been primarily in English and does not cover all possible scenarios. Similar to other large language models, DeciLM-7B's outputs are unpredictable, and the model may generate inaccurate, biased, or otherwise objectionable responses. Therefore, developers planning to use DeciLM-7B should conduct thorough safety testing and tuning specifically designed for their intended applications of the model before deployment.

How to Cite

Please cite this model using the following format.

@misc{DeciFoundationModels,
title = {DeciLM-7B},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B},
}

📄 License

This model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご