Granite-8B-Qiskit Open-Source Model - Free Generation of High-Quality Qiskit Quantum Computing Code

Granite 8b Qiskit

Developed by Qiskit

granite-8b-qiskit is a model with 8 billion parameters, focusing on generating high-quality Qiskit quantum computing code.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Quantum code generation #Exclusive for Qiskit #8 billion parameters

Downloads 1,676

Release Time : 11/12/2024

Model Overview

This model is extended pre-trained and fine-tuned based on granite-8b-code-base, specifically designed to generate non-deprecated Qiskit code, suitable for quantum computing practitioners and new Qiskit users.

Model Features

High-quality Qiskit code generation

By fine-tuning with Qiskit code and instruction data, it can generate high-quality, non-deprecated Qiskit code.

Strict data licensing

Only use data with clear licenses, including license types such as Apache 2.0, MIT, and Unlicense.

Data filtering and deduplication

Adopt precise and fuzzy deduplication methods, and filter HAP, PII, and malware content to ensure data quality.

Model Capabilities

Qiskit code generation

Quantum computing code assistance

Answering code-related questions

Use Cases

Quantum computing development

Generate random quantum circuits

Generate random quantum circuit code of a specific scale according to user instructions

Generate directly runnable Qiskit code

Quantum algorithm implementation

Help developers quickly implement quantum algorithms

Reduce the coding time for quantum algorithm implementation

🚀 granite-8b-qiskit

granite-8b-qiskit is an 8B parameter model. It is extended, pre - trained, and fine - tuned on top of granite-8b-code-base using Qiskit code and instruction data. This enhances its ability to write high - quality and non - deprecated Qiskit code. The model only uses data with licenses such as Apache 2.0, MIT, the Unlicense, Mulan PSL Version 2, BSD - 2, BSD - 3, and Creative Commons Attribution 4.0.

image/png

🚀 Quick Start

This model is designed for generating quantum computing code using Qiskit. It can serve as an assistant for both quantum computing practitioners and new Qiskit users to build Qiskit code or respond to Qiskit coding - related instructions and questions.

✨ Features

Developers: IBM Quantum & IBM Research
Related Papers: Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code and Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models
Release Date: November 12th, 2024
License: apache - 2.0

📦 Installation

Not provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

This is a simple example of how to use granite-8b-qiskit model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "qiskit/granite-8b-qiskit"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Build a random circuit with 5 qubits" },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(input_text, return_tensors="pt")
# move tokenized inputs to device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=128)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)

📚 Documentation

Model Information

Property	Details
pipeline_tag	text - generation
inference	false
license	apache - 2.0
datasets	public - qiskit, synthetic - qiskit
metrics	code_eval
library_name	transformers
tags	code, granite, qiskit

Model Index

Name: granite - 8b - qiskit
Results:
- Task 1:
  - Task Type: text - generation
  - Dataset:
    - Type: qiskit - humaneval
    - Name: Qiskit HumanEval
  - Metrics:
    - Name: pass@1
    - Type: pass@1
    - Value: 45.69
    - Verified: false
- Task 2:
  - Task Type: text - generation
  - Dataset:
    - Type: bigcode/humanevalpack
    - Name: HumanEvalSynthesis(Python)
  - Metrics:
    - Name: pass@1
    - Type: pass@1
    - Value: 58.53
    - Verified: false

Training Data

Data Collection and Filtering: Our code data comes from a combination of publicly available datasets (e.g., Code on https://github.com) and additional synthetic data generated at IBM Quantum. We exclude code older than 2023.
Exact and Fuzzy Deduplication: We use both exact and fuzzy deduplication to remove documents with (near) identical code content.
HAP, PII, Malware Filtering: We rely on the base model ibm - granite/granite - 8b - code - base for HAP and malware filtering from the initial datasets. We also redact Personally Identifiable Information (PII) in our datasets by replacing PII content (e.g., names, email addresses, keys, passwords) with corresponding tokens (e.g., ⟨NAME⟩, ⟨EMAIL⟩, ⟨KEY⟩, ⟨PASSWORD⟩).

Infrastructure

We trained granite-8b-qiskit using IBM's super - computing cluster (Vela) with NVIDIA A100 GPUs.

Ethical Considerations and Limitations

The use of Large Language Models involves risks and ethical considerations. For code generation, one should not fully rely on specific code models for crucial decisions or important information as the generated code may not work as expected. The granite-8b-qiskit model is no exception. Although it is suitable for multiple code - related tasks, it has not undergone safety alignment, so it may produce problematic outputs. Additionally, it is unclear whether smaller models are more prone to hallucination by copying source code from the training dataset due to their smaller size and memorization capacity. This is an active research area, and we expect more in - depth exploration, understanding, and mitigation. Regarding ethics, there is a latent risk of malicious use of all Large Language Models. We encourage the community to use the granite-8b-qiskit model ethically and responsibly.

🔧 Technical Details

Not provided in the original document, so this section is skipped.

📄 License

The model is licensed under apache - 2.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご