đ asadfgglie/banban-beta-v2
An AI VTuber model named BanBan. The goal is to create a dedicated AI VT for NTNU VLSI. Currently, it is only available to NTNU VLSI members.
đ Quick Start
First, you need to select the model according to your computer's hardware. You can choose files with the .gguf
extension in Files and versions, which are the model's weight files.
Among them, mmproj-model-f16.gguf
is special. It is the clip weight file that allows BanBan to "open its eyes". If you need to let BanBan "open its eyes", remember to download this file.
This is a model quantized by gguf, so it can be used in any deployment environment that supports llama.cpp.
Regarding how to choose the model size, the simplest advice is to multiply the file size of the model weights by 2. If your graphics card's dedicated memory capacity is greater than this number, you can choose with confidence.
As for the impact of various quantization settings on the model's intelligence and speed, you can simply understand it as follows: the larger the model weight file, the higher the model's accuracy, the stronger its capabilities, and the slower its speed. For a graphics card with at least 6GB of dedicated memory (such as RTX 3050 6G or RTX 4060), I recommend using the Q3 quantization level model to achieve the best speed and ensure that there will be no graphics card memory shortage issues, as long as you don't play games while running the model.
For a graphics card with 10GB or more of memory, you can choose the Q4 or Q5 quantization level, which is a quantization version that achieves a good balance between speed and performance.
As for Q6 and Q8, I recommend an RTX 4080. However, if you have a graphics card with 16GB of memory, you can try Q6 quantization, and it should work.
F16 is the original precision, just stored in the gguf format. If you have an RTX 4090, please tell me the effect after testing. Theoretically, it should be better than my own tests because I can only test with the quantized model.
For beginners who don't know how to code and just want to try it out, I recommend LM Studio. This free project can easily handle all the troublesome settings for you, but it doesn't support custom names, so you may not be able to experience the best dialogue effect. Also, don't forget mmproj-model-f16.gguf
, which is BanBan's "eyes"!
(The main reason is that this project hasn't updated its internal dialogue record storage format to be the same as the latest version of the openAI API. The latest version of the openAI API already supports setting the author name for each message, and Llama3 itself has a designed prompt format and also supports custom author names to a limited extent.)
If you choose to use oobabooga/text-generation-webui as your inference platform, since this project doesn't support the multi-modal function of llama.cpp
, BanBan can only exist behind the text.
⨠Features
- AI VTuber Function: Mainly used for chatting and VT live streaming, currently only capable of chatting.
- Pineapple Pizza Preference: BanBan is a passionate supporter of pineapple pizza, which is a setting in the training set and prompt.
đĻ Installation
The installation mainly involves selecting the appropriate model weight files based on your hardware. You can find the model weight files with the .gguf
extension in Files and versions.
đģ Usage Examples
Basic Usage
You can directly use BanBan as an ordinary AI assistant, but it has a preference for pineapple pizza.
Advanced Usage
If you want to let BanBan "open its eyes", you need to download the mmproj-model-f16.gguf
file.
đ Documentation
Model Description
Uses
Direct Use
You can use BanBan as an ordinary AI assistant with a food preference for pineapple pizza.
Out-of-Scope Use
It is not recommended to ask BanBan about political issues because the base model Llama3 is trained to avoid answering such questions.
Bias, Risks, and Limitations
BanBan is a fanatical supporter of pineapple pizza, which is a setting in the training set and prompt.
Recommendations
You must accept pineapple pizza to understand BanBan.
đ§ Technical Details
Model Architecture and Objective
LlavaForConditionalGeneration, turning the base model into Llama3.
Compute Infrastructure
Hardware
- CPU: Intel(R) Core(TM) i5-14400
- GPU: NVIDIA GeForce RTX 4060 Ti 16G
- RAM: 32G
Software
Thanks to the great hiyouga/LLaMA-Factory, which saved me a lot of time on infrastructure construction.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 3.0
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Input Tokens Seen |
1.2315 |
0.4168 |
100 |
1.1917 |
1551832 |
1.0165 |
0.8336 |
200 |
1.0467 |
3072864 |
0.8943 |
1.2503 |
300 |
0.9339 |
4609344 |
0.7505 |
1.6671 |
400 |
0.8318 |
6138408 |
0.577 |
2.0839 |
500 |
0.7647 |
7672440 |
0.5811 |
2.5007 |
600 |
0.7326 |
9211432 |
0.5544 |
2.9174 |
700 |
0.7245 |
10741104 |
Metrics |
Score |
predict_bleu-4 |
22.36944225630876 |
predict_model_preparation_time |
0.0048 |
predict_rouge-1 |
41.827983993072735 |
predict_rouge-2 |
21.250519792182086 |
predict_rouge-l |
36.58219059871351 |
predict_runtime |
55992.1102 |
predict_samples_per_second |
0.072 |
predict_steps_per_second |
0.072 |
Framework versions
- PEFT 0.11.1
- Transformers 4.43.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
đ License
Only for internal research use by NTNU VLSI members.