🚀 CyberAgentLM2-7B (CALM2-7B)
CyberAgentLM2 is a decoder-only language model. It is pre-trained on 1.3T tokens of publicly available Japanese and English datasets, offering powerful language processing capabilities for both languages.
🚀 Quick Start
Prerequisites
Ensure you have installed the following libraries:
- transformers >= 4.34.1
- accelerate
Installation
You can install the required libraries using the following command:
pip install transformers>=4.34.1 accelerate
✨ Features
- Pre-trained on large-scale Japanese and English datasets, providing strong language understanding and generation abilities.
- Offers a variant CyberAgentLM2-Chat for specific chat scenarios.
📦 Installation
You need to install the necessary Python libraries. Here are the specific requirements:
- transformers >= 4.34.1
- accelerate
You can install them using the following command:
pip install transformers>=4.34.1 accelerate
💻 Usage Examples
Basic Usage
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
assert transformers.__version__ >= "4.34.1"
model = AutoModelForCausalLM.from_pretrained("cyberagent/calm2-7b", device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("cyberagent/calm2-7b")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "AIによって私達の暮らしは、"
token_ids = tokenizer.encode(prompt, return_tensors="pt")
output_ids = model.generate(
input_ids=token_ids.to(model.device),
max_new_tokens=100,
do_sample=True,
temperature=0.9,
streamer=streamer,
)
📚 Documentation
Model Details
Property |
Details |
Model size |
7B |
Trained tokens |
1.3T tokens |
Context length |
4096 |
Model Type |
Transformer-based Language Model |
Language(s) |
Japanese, English |
Developed by |
CyberAgent, Inc. |
License |
Apache-2.0 |
📄 License
This model is licensed under the Apache-2.0 license.
Author
Ryosuke Ishigami
Citations
@article{touvron2023llama,
title={LLaMA: Open and Efficient Foundation Language Models},
author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
journal={arXiv preprint arXiv:2302.13971},
year={2023}
}