🚀 Cabra Qwen 7b
This model is a fine - tuned version of Qwen 1.5 7b Chat with the internal Cabra 10k dataset. It is optimized for Portuguese and can respond in Portuguese natively. It shows improvements on several Brazilian benchmarks compared to the base model.
Try our demo here: CabraChat.
Check out our other models: Cabra.
📚 Documentation
Model Details
Model: Qwen 1.5 7b chat
Qwen1.5 is a series of language models that includes decoder - type language models of different sizes. For each size, we released the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, QKV attention bias, grouped query attention, a mixture of sliding window attention and full attention, etc. Additionally, we have an improved adaptive tokenizer for multiple natural languages and codes. For the beta version, we temporarily do not include GQA and the mixture of SWA and full attention.
Dataset: Cabra 10k
This is an internal dataset for fine - tuning. It will be released soon.
Quantization / GGUF
We have placed various quantized versions (GGUF) in the "quantization" branch.
Example
<s> [INST] What makes Brasília special? [/INST] Brasília, the capital of Brazil, is a unique and special city for several reasons. Established as the new political center of the country in 1960, it was designed by architect Oscar Niemeyer and urban planner Lúcio Costa to be a symbol of Brazilian modernity and progress.
Here are some characteristics that make Brasília special:
Urban Design: The city is known for its innovative and futuristic design, with its imposing skyscrapers, large open squares, and extensive green spaces. The city layout follows a grid pattern, with each district representing different aspects of Brazilian culture and history.
Three Powers Square: This huge central square is surrounded by the buildings of the National Assembly (known as the "Palácio do Planalto"), the Federal Senate, and the Supreme Federal Court. It is an iconic landmark of the city and a popular place for political events.
Monuments and Architecture: Brasília has many impressive monuments and notable architectural buildings, including the National Museum of Natural History and Sciences, the Anhembi Planetarium, the Palácio da Alvorada, and the Basilica of Our Lady of Aparecida.
Green Spaces: The city is surrounded by parks and nature reserves, providing residents and visitors with ample space to relax, practice sports, or simply appreciate the natural beauty of the environment.
Culture and Events: As the national capital, Brasília hosts several cultural festivals and important events throughout the year, such as the Brasília International Film Festival, the United Nations Carnival, and the Brasília Summer Music Festival.
Historical Significance: The construction of Brasília marked a significant moment in Brazilian history as it symbolized its political independence and the pursuit of modernization. It also served as an example for other developing countries seeking to create new capitals.
Social Inclusion: Brasília was designed with social inclusion in mind, with well - planned neighborhoods evenly distributed throughout the city to ensure equal access to education, healthcare, and public services.
In summary, Brasília is a unique and special city due to its innovative design, iconic tourist attractions, historical connection to Brazil's development, and commitment to social inclusion.
</s>
Training Parameters
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3
Framework
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu118
- Datasets 2.14.6
- Tokenizers 0.15.2
💻 Usage
Currently, the model is intended for research purposes only. Possible research areas and tasks include:
- Research on generative models.
- Investigation and understanding of the limitations and biases of generative models.
Commercial use is prohibited. Only for research.
Evals
Tasks |
Version |
Filter |
n - shot |
Metric |
Value |
Stderr |
assin2_rte |
1.1 |
all |
15 |
f1_macro |
0.8852 |
± 0.0046 |
|
|
all |
15 |
acc |
0.8852 |
± 0.0046 |
assin2_sts |
1.1 |
all |
15 |
pearson |
0.7618 |
± 0.0067 |
|
|
all |
15 |
mse |
0.5283 |
± N/A |
bluex |
1.1 |
all |
3 |
acc |
0.5605 |
± 0.0107 |
|
|
exam_id__USP_2021 |
3 |
acc |
0.5385 |
± 0.0400 |
|
|
exam_id__UNICAMP_2021_2 |
3 |
acc |
0.5490 |
± 0.0401 |
|
|
exam_id__USP_2019 |
3 |
acc |
0.4750 |
± 0.0455 |
|
|
exam_id__USP_2022 |
3 |
acc |
0.4898 |
± 0.0413 |
|
|
exam_id__USP_2020 |
3 |
acc |
0.5714 |
± 0.0381 |
|
|
exam_id__UNICAMP_2019 |
3 |
acc |
0.6200 |
± 0.0398 |
|
|
exam_id__USP_2018 |
3 |
acc |
0.5000 |
± 0.0392 |
|
|
exam_id__USP_2024 |
3 |
acc |
0.7073 |
± 0.0411 |
|
|
exam_id__UNICAMP_2020 |
3 |
acc |
0.6000 |
± 0.0382 |
|
|
exam_id__UNICAMP_2024 |
3 |
acc |
0.6000 |
± 0.0422 |
|
|
exam_id__USP_2023 |
3 |
acc |
0.5682 |
± 0.0429 |
|
|
exam_id__UNICAMP_2018 |
3 |
acc |
0.4444 |
± 0.0391 |
|
|
exam_id__UNICAMP_2021_1 |
3 |
acc |
0.4783 |
± 0.0425 |
|
|
exam_id__UNICAMP_2023 |
3 |
acc |
0.6047 |
± 0.0430 |
|
|
exam_id__UNICAMP_2022 |
3 |
acc |
0.7179 |
± 0.0416 |
enem |
1.1 |
all |
3 |
acc |
0.6921 |
± 0.0071 |
|
|
exam_id__2012 |
3 |
acc |
0.7241 |
± 0.0240 |
|
|
exam_id__2015 |
3 |
acc |
0.7059 |
± 0.0241 |
|
|
exam_id__2023 |
3 |
acc |
0.6741 |
± 0.0233 |
|
|
exam_id__2016_2 |
3 |
acc |
0.6992 |
± 0.0238 |
|
|
exam_id__2022 |
3 |
acc |
0.6090 |
± 0.0245 |
|
|
exam_id__2010 |
3 |
acc |
0.7436 |
± 0.0233 |
|
|
exam_id__2016 |
3 |
acc |
0.6612 |
± 0.0248 |
|
|
exam_id__2009 |
3 |
acc |
0.7391 |
± 0.0236 |
|
|
exam_id__2014 |
3 |
acc |
0.6606 |
± 0.0262 |
|
|
exam_id__2017 |
3 |
acc |
0.7155 |
± 0.0241 |
|
|
exam_id__2011 |
3 |
acc |
0.7778 |
± 0.0222 |
|
|
exam_id__2013 |
3 |
acc |
0.6019 |
± 0.0272 |
faquad_nli |
1.1 |
all |
15 |
f1_macro |
0.5780 |
± 0.0165 |
|
|
all |
15 |
acc |
0.7785 |
± 0.0115 |
hatebr_offensive_binary |
1.0 |
all |
25 |
f1_macro |
0.7632 |
± 0.0082 |
|
|
all |
25 |
acc |
0.7736 |
± 0.0079 |
oab_exams |
1.5 |
all |
3 |
acc |
0.4323 |
± 0.0061 |
|
|
exam_id__... |
... |
... |
... |
... |
📄 License
This model is released under the CC license.
Property |
Details |
Model Type |
Fine - tuned version of Qwen 1.5 7b Chat |
Training Data |
Cabra 10k (internal dataset, to be released) |
Supported Languages |
Portuguese |
Quantization |
GGUF versions available in the "quantization" branch |
Framework |
Transformers 4.39.0.dev0, Pytorch 2.1.2+cu118, Datasets 2.14.6, Tokenizers 0.15.2 |
Usage |
For research purposes only, commercial use prohibited |