The banban-beta-v2-gguf open-source AI virtual anchor Banban model enables image and text-to-speech conversion for live streaming!

Banban Beta V2 Gguf

Developed by asadfgglie

AI virtual anchor BanBan model, a virtual anchor assistant designed specifically for the NTNU VLSI club, capable of image-text-to-text conversion.

Image-to-Text Supports Multiple Languages#Virtual Anchor #Multimodal Dialogue #Pineapple Pizza Preference

Downloads 97

Release Time : 8/8/2024

Model Overview

This model is primarily used for chatting and virtual anchor livestreams, currently focusing on chat functionality with specific food preferences (e.g., pineapple pizza).

Model Features

Multimodal Capability

Supports image-text-to-text conversion, capable of processing joint input of images and text.

Specific Preference Setting

The model has a built-in strong preference for pineapple pizza, a core setting in its training data.

Quantization Support

Provides multiple quantized versions of model weights to accommodate different hardware configurations.

Model Capabilities

Image-text understanding

Multi-turn dialogue

Virtual anchor interaction

Use Cases

Entertainment

Virtual Anchor Chat

Used for real-time chat interaction during virtual anchor livestreams.

Delivers a conversational experience with specific personality traits.

Education

Club Internal Assistant

Serves as an internal AI assistant for the NTNU VLSI club, answering members' questions.

Enhances interaction experiences among club members.

🚀 asadfgglie/banban-beta-v2

An AI VTuber model named BanBan. The goal is to create a dedicated AI VT for NTNU VLSI. Currently, it is only available to NTNU VLSI members.

🚀 Quick Start

First, you need to select the model according to your computer's hardware. You can choose files with the .gguf extension in Files and versions, which are the model's weight files.

Among them, mmproj-model-f16.gguf is special. It is the clip weight file that allows BanBan to "open its eyes". If you need to let BanBan "open its eyes", remember to download this file.

This is a model quantized by gguf, so it can be used in any deployment environment that supports llama.cpp.

Regarding how to choose the model size, the simplest advice is to multiply the file size of the model weights by 2. If your graphics card's dedicated memory capacity is greater than this number, you can choose with confidence.

As for the impact of various quantization settings on the model's intelligence and speed, you can simply understand it as follows: the larger the model weight file, the higher the model's accuracy, the stronger its capabilities, and the slower its speed. For a graphics card with at least 6GB of dedicated memory (such as RTX 3050 6G or RTX 4060), I recommend using the Q3 quantization level model to achieve the best speed and ensure that there will be no graphics card memory shortage issues, as long as you don't play games while running the model.

For a graphics card with 10GB or more of memory, you can choose the Q4 or Q5 quantization level, which is a quantization version that achieves a good balance between speed and performance.

As for Q6 and Q8, I recommend an RTX 4080. However, if you have a graphics card with 16GB of memory, you can try Q6 quantization, and it should work.

F16 is the original precision, just stored in the gguf format. If you have an RTX 4090, please tell me the effect after testing. Theoretically, it should be better than my own tests because I can only test with the quantized model.

For beginners who don't know how to code and just want to try it out, I recommend LM Studio. This free project can easily handle all the troublesome settings for you, but it doesn't support custom names, so you may not be able to experience the best dialogue effect. Also, don't forget mmproj-model-f16.gguf, which is BanBan's "eyes"!

(The main reason is that this project hasn't updated its internal dialogue record storage format to be the same as the latest version of the openAI API. The latest version of the openAI API already supports setting the author name for each message, and Llama3 itself has a designed prompt format and also supports custom author names to a limited extent.)

If you choose to use oobabooga/text-generation-webui as your inference platform, since this project doesn't support the multi-modal function of llama.cpp, BanBan can only exist behind the text.

✨ Features

AI VTuber Function: Mainly used for chatting and VT live streaming, currently only capable of chatting.
Pineapple Pizza Preference: BanBan is a passionate supporter of pineapple pizza, which is a setting in the training set and prompt.

📦 Installation

The installation mainly involves selecting the appropriate model weight files based on your hardware. You can find the model weight files with the .gguf extension in Files and versions.

💻 Usage Examples

Basic Usage

You can directly use BanBan as an ordinary AI assistant, but it has a preference for pineapple pizza.

Advanced Usage

If you want to let BanBan "open its eyes", you need to download the mmproj-model-f16.gguf file.

📚 Documentation

Model Description

Developed by: asadfgglie
Funded by: asadfgglie
Shared by: asadfgglie
Model type: llava
Language(s) (NLP): Image-text-to-text
License: Only for internal research use by NTNU VLSI members
Finetuned from model: xtuner/llava-llama-3-8b-v1_1-transformers

Uses

Direct Use

You can use BanBan as an ordinary AI assistant with a food preference for pineapple pizza.

Out-of-Scope Use

It is not recommended to ask BanBan about political issues because the base model Llama3 is trained to avoid answering such questions.

Bias, Risks, and Limitations

BanBan is a fanatical supporter of pineapple pizza, which is a setting in the training set and prompt.

Recommendations

You must accept pineapple pizza to understand BanBan.

🔧 Technical Details

Model Architecture and Objective

LlavaForConditionalGeneration, turning the base model into Llama3.

Compute Infrastructure

Hardware

CPU: Intel(R) Core(TM) i5-14400
GPU: NVIDIA GeForce RTX 4060 Ti 16G
RAM: 32G

Software

Thanks to the great hiyouga/LLaMA-Factory, which saved me a lot of time on infrastructure construction.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.2315	0.4168	100	1.1917	1551832
1.0165	0.8336	200	1.0467	3072864
0.8943	1.2503	300	0.9339	4609344
0.7505	1.6671	400	0.8318	6138408
0.577	2.0839	500	0.7647	7672440
0.5811	2.5007	600	0.7326	9211432
0.5544	2.9174	700	0.7245	10741104

Metrics	Score
predict_bleu-4	22.36944225630876
predict_model_preparation_time	0.0048
predict_rouge-1	41.827983993072735
predict_rouge-2	21.250519792182086
predict_rouge-l	36.58219059871351
predict_runtime	55992.1102
predict_samples_per_second	0.072
predict_steps_per_second	0.072

Framework versions

PEFT 0.11.1
Transformers 4.43.2
Pytorch 2.3.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

📄 License

Only for internal research use by NTNU VLSI members.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご