Open Cabrita 3B Open-source Large Language Model - Optimize Portuguese and Narrow the Gap between Foreign Language and English Models

Open Cabrita3b GGUF

Developed by lucianosb

Open Cabrita 3B is an open-source large language model optimized for Portuguese, based on the LLaMA architecture, designed to narrow the performance gap between foreign language and English models.

Large Language Model OtherOpen Source License:Apache-2.0 #Portuguese language model #Small parameter efficiency #Text generation

Downloads 352

Release Time : 8/27/2023

Model Overview

This model is specifically designed for Portuguese, addressing the performance gap in non-English language models through optimized training, supporting tasks such as text generation.

Model Features

Portuguese optimization

Specially trained and optimized for Portuguese, significantly improving the quality of Portuguese text generation.

Open-source license

Released under the Apache 2.0 license, allowing for both commercial and research use.

Lightweight design

With 3B parameters, it maintains performance while reducing resource requirements.

Model Capabilities

Portuguese text generation

Instruction following

Dialogue systems

Use Cases

Content creation

Portuguese article writing

Assists users in generating Portuguese articles, blogs, and other content.

Produces fluent and contextually appropriate Portuguese text.

Education

Language learning aid

Provides writing assistance and language practice for Portuguese learners.

Generates Portuguese materials suitable for different learning stages.

🚀 Open Cabrita 3B - GGUF

Open Cabrita 3B - GGUF is a quantized version of the Open Cabrita 3B model, offering different quantization methods to balance accuracy and resource usage.

🚀 Quick Start

Model Information

Model Creator: 22h
Original Model: Open Cabrita 3B
Paper: CABRITA: CLOSING THE GAP FOR FOREIGN LANGUAGES

Included Files

Name	Quant Method	Bits	Size	Description
opencabrita3b-q4_0.gguf	q4_0	4	1.94 GB	4-bit quantization.
opencabrita3b-q4_1.gguf	q4_1	4	2.14 GB	4-bit quantization. Higher accuracy than q4_0 but not as good as q5_0. Faster inference than q5 models.
opencabrita3b-q5_0.gguf	q5_0	5	2.34 GB	5-bit quantization. Best accuracy, higher resource usage, slower inference.
opencabrita3b-q5_1.gguf	q5_1	5	2.53 GB	5-bit quantization. Even better accuracy, higher resource usage, slower inference.
opencabrita3b-q8_0.gguf	q8_0	8	3.52 GB	8-bit quantization. Almost indistinguishable from float16. Uses a lot of resources and is slower.

⚠️ Important Note

The above RAM values do not assume GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM.

📦 Installation

Running with `llama.cpp`

I used the following command. Adjust it to your needs:

./main -m ./models/open-cabrita3b/opencabrita3b-q5_1.gguf --color --temp 0.5 -n 256 -p "### Instruction: {command} ### Response: "

To understand the parameters, see the llama.cpp documentation

You can try it for free on Google Colab: Open_Cabrita_llamacpp_5_1.ipynb

📚 Documentation

About the GGUF Format

GGUF is a new format introduced by the llama.cpp team on August 21, 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

The main benefit of GGUF is that it is an extensible and future-proof format that stores more information about the model as metadata. It also includes significantly improved tokenization code, including full support for special tokens for the first time. This should improve performance, especially with models that use new special tokens and implement custom prompt models.

Here is a list of clients and libraries known to support GGUF:

llama.cpp.
text-generation-webui, the most widely used web interface. Supports GGUF with GPU acceleration via the ctransformers backend - the llama-cpp-python backend should work soon too.
KoboldCpp, now supports GGUF starting from version 1.41! A powerful GGML web interface, with full GPU acceleration. Especially good for storytelling.
LM Studio, versions 0.2.2 and later support GGUF. A fully equipped local GUI with GPU acceleration on both Windows (NVidia and AMD) and macOS.
LoLLMS Web UI, should work now, choose the c_transformers backend. A great web interface with many interesting features. Supports CUDA GPU acceleration.
ctransformers, now supports GGUF starting from version 0.2.24! A Python library with GPU acceleration, LangChain support, and an OpenAI-compatible AI server.
llama-cpp-python, supports GGUF starting from version 0.1.79. A Python library with GPU acceleration, LangChain support, and an OpenAI-compatible API server.
candle, added GGUF support on August 22. Candle is a Rust ML framework focused on performance, including GPU support and ease of use.
LocalAI, added GGUF support on August 23. LocalAI provides a REST API for LLM and image generation models.

Template

### Instruction:
{prompt}

### Response:

📄 License

This project is licensed under the Apache 2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご