Model Overview
Model Features
Model Capabilities
Use Cases
🚀 aiXcoder-7B Code Large Language Model
aiXcoder-7B is a code large language model capable of understanding and generating code in multiple programming languages, offering high - performance solutions for code - related tasks.
🚀 Quick Start
✨ Features
As the capabilities of large code models are gradually being unearthed, aiXcoder has consistently pondered on how to make these models more beneficial in real development scenarios. The open - sourced aiXcoder 7B Base has undergone extensive training on 1.2T Unique Tokens, with its pre - training tasks and contextual information uniquely designed for real - world code generation contexts.
- Code Completion Excellence: Among all models of similar parameter sizes, aiXcoder 7B Base stands out as the most effective model in code completion scenarios.
- Multilingual nl2code Benchmark Performance: It surpasses mainstream models like codellama 34B and StarCoder2 15B in the average performance on the multilingual nl2code benchmark.
- Foundational Focus: The current version is a foundational model that focuses on improving the efficiency and accuracy of code completion and code generation tasks.
📦 Installation
Environment Requirements
Option 1: Build Env
To run the model inference code, you'll need the following environment setup:
- Python 3.8 or higher
- PyTorch 2.1.0 or higher
- sentencepiece 0.2.0 or higher
- transformers 4.34.1 or higher (if run inference by transformers library)
Please ensure all dependencies are installed using the following command:
conda create -n aixcoder-7b python=3.11
conda activate aixcoder-7b
git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b
pip install -r requirements.txt
requirements.txt
listed all necessary libraries and their versions.
To achieve faster inference speeds, especially for large models, we recommend installing flash attention
. Flash attention
is an optimized attention mechanism that significantly reduces computation time for transformer - based models without sacrificing accuracy.
Before proceeding, ensure your environment meets the CUDA requirements as flash attention
leverages GPU acceleration. Follow these steps to install flash attention
:
git clone git@github.com:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install
Option 2: Docker
For a consistent and isolated environment, we recommend running the model inference code using Docker. Here's how to set up and use Docker for our model:
- Install Docker: If you haven't already, install Docker on your machine.
- Pull the Docker Image: Pull the Docker image from Docker Hub.
docker pull pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
- Run the Container: Once the image is pulled, you can run the model inside a Docker container.
docker run --gpus all -it -v /dev/shm:/dev/shm --name aix_instance pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel /bin/bash
pip install sentencepiece
git clone git@github.com:aixcoder-plugin/aiXcoder-7b.git
cd aiXcoder-7b
This command starts a container named aix_instance from the pytorch image. You can interact with the model inside this container.
To achieve faster inference speeds, especially for large models, we recommend installing flash attention
.
git clone git@github.com:Dao-AILab/flash-attention.git
cd flash-attention
MAX_JOBS=8 python setup.py install
- Model Inference: Within the Docker container, you can run the model inference code as described in the Inference Example section.
Using Docker provides a clean, controlled environment that minimizes issues related to software versions and dependencies.
Model Weights
You can download the model weights from the following link:
- aiXcoder Base Download
- aiXcoder Instruct Download (Comming soon...)
💻 Usage Examples
Basic Usage
Command Line Execution
For a quick start, you can run the model inference directly from the command line:
torchrun --nproc_per_node 1 sess_megatron.py --model_dir "path/to/model_weights_dir"
Replace "path/to/model_weights_dir" with the actual path to your downloaded model weights.
or run inference with huggingface's transformers:
python sess_huggingface.py
Python Script Execution
Alternatively, you can invoke the model programmatically within your Python scripts. This method provides more flexibility for integrating the model into your applications or workflows. Here's a simple example on how to do it:
from sess_megatron import TestInference
infer = TestInference()
res = infer.run_infer(
# for FIM style input, code_string stands for prefix context
code_string="""# 快速排序算法""",
# for FIM style input, later_code stands for suffix context
later_code="\n",
# file_path should be a path from project to file
file_path="test.py",
# max num for generated tokens
max_new_tokens=256,
)
print(res)
"""output:
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
less = [i for i in arr[1:] if i <= pivot]
greater = [i for i in arr[1:] if i > pivot]
return quick_sort(less) + [pivot] + quick_sort(greater)
# 测试
arr = [3, 2, 1, 4, 5]
print(quick_sort(arr)) # [1, 2, 3, 4, 5]
"""
import torch
import sys
from hf_mini.utils import input_wrapper
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aixcoder-7b-base")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aixcoder-7b-base", torch_dtype=torch.bfloat16)
text = input_wrapper(
# for FIM style input, code_string stands for prefix context
code_string="# 快速排序算法",
# for FIM style input, later_code stands for suffix context
later_code="\n# 测试\narr = [3, 2, 1, 4, 5]\nprint(quick_sort(arr)) # [1, 2, 3, 4, 5]",
# file_path should be a path from project to file
path="test.py"
)
if len(text) == 0:
sys.exit()
inputs = tokenizer(text, return_tensors="pt", return_token_type_ids=False)
inputs = inputs.to(device)
model.to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
"""output:
def quick_sort(arr):
# 如果数组长度小于等于1,直接返回
if len(arr) <= 1:
return arr
# 选择数组的第一个元素作为基准
pivot = arr[0]
# 初始化左右指针
left, right = 1, len(arr) - 1
# 循环直到左指针小于右指针
while left < right:
# 从右到左找到第一个小于基准的元素,与左指针元素交换
if arr[right] < pivot:
arr[left], arr[right] = arr[right], arr[left]
left += 1
# 从左到右找到第一个大于等于基准的元素,与右指针元素交换
if arr[left] >= pivot:
right -= 1
# 将基准元素与左指针元素交换
arr[left], arr[0] = arr[0], arr[left]
# 对左半部分进行递归排序
quick_sort(arr[:left])
# 对右半部分进行递归排序
quick_sort(arr[left + 1:])
return arr</s>
"""
📄 License
The model weights are licensed under the Model License for academic research use; for commercial use, please apply by sending an email to support@aiXcoder.com.
🔗 Acknowledgments
We would like to thank all contributors to the open - source projects and datasets that made this work possible.
Thank you for your interest in our Code Large Language Model. We look forward to your contributions and feedback!

