🚀 Model Card for Velvet-2B
Velvet is a family of Italian large language models developed from scratch with a dense architecture. This model was trained on the HPC Leonardo infrastructure hosted by CINECA, using publicly available data that has been extensively curated.
The training of the Velvet family started with over 10 trillion tokens in 6 languages (Italian, English, Spanish, Portuguese-Brazilian, German, French). Velvet-2B has been trained on nearly 3 trillion tokens across two languages (Italian, English).
✨ Features
- Developed from scratch with a dense architecture.
- Trained on a large amount of curated public data.
- Available in two sizes: 2B and 14B parameters.
- Supports multiple languages, including Italian and English.
📦 Installation
Not provided in the original README, so this section is skipped.
💻 Usage Examples
Not provided in the original README, so this section is skipped.
📚 Documentation
Model details
- Model Developers: Technology and innovation Team, Almawave
- Input: The models accept only text input.
- Output: The models generate only text output.
- Release Date: February 11th, 2025.
- License: Apache 2.0
Model Architecture and training
The Velvet family of models comes in two sizes - 2B and 14B parameters - namely, Velvet-2B and Velvet-14B. Velvet-2B is a 2B parameter instruct model fine-tuned from Velvet-2B-base using a combination of open-source instruction datasets with permissive licenses and internally collected synthetic datasets tailored for solving textual "instruction-based" problems.
Architecture
- Auto-regressive language model with a transformer-based causal decoder-only design.
- 28 transformer layers.
- MLP intermediate size of 8,192.
- Grouped Query Attention (GQA): 32 query heads and 8 key-value heads for efficiency.
- Rotary Position Embedding (RoPE).
- SiLU activation function with RMSNorm method.
- Trained on sequences of 4K tokens, supports context length up to 32K tokens.
- 127K vocabulary size, designed to accommodate language diversity.
- Training phase: pretraining & post-training
Status
This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. Almawave is actively working on strategies to enhance alignment and robustness in future iterations of the Velvet model.
License
Velvet-2B is made available under the Apache 2.0 license.
Supported Languages
Velvet-2B has been trained on Italian and English. To ensure high-quality multilingual performance, the dataset was curated to balance linguistic representation, reducing overfitting biases.
Intended Use
Velvet-2B is designed to be integrated into AI systems or applications. Its potential uses include, but are not limited to, text generation, classification, summarization, and question answering. It is important to note that specific applications may require further model adaptations or additional safeguards to prevent undesirable behavior or outputs.
Capabilities
- Summarization
- Information Extraction
- RAG (Retrieval Augmented Generation)
- Paraphrasing
- Textual Entailment
- Natural Language Inference
- Common Sense Reasoning
- Text Classification
- Machine Translation
- Question Answering
- Text Completion
Training Data
Overview
The model was pre-trained on nearly 3 trillion tokens of data from publicly available sources. These sources include a diverse collection of web text, exposing the model to a wide range of linguistic styles, topics, and vocabulary. The training dataset was built with a balanced representation of multiple languages.
The fine-tuning data includes publicly available instruction datasets, as well as over 1M human-annotated and synthetic examples for SFT. Moreover, we used over 50k human-generated examples for safety instructions. Neither the pre-training nor the fine-tuning datasets include Almawave's customer data.
We have made significant efforts to enhance the reliability of responses in terms of factual accuracy; however, we always recommend grounding LLM responses with external factual data (e.g., Retrieval Augmented Generation).
Data Freshness
The pre-training data has a cutoff between August 2024 and October 2024 for the two different models.
Evaluation
Italian language
Category |
Benchmark |
Velvet-2B |
General |
MMLU (5-shot) |
39.6 |
Commonsense |
Hellaswag (0-shot) |
54.3 |
|
WinoGrande ITA-bench (0-shot) |
61.9 |
|
PIQA ITA-bench (0-shot) |
67.3 |
|
SciQ ITA-bench (0-shot) with p. |
86.6 |
Reasoning |
ARC-Challenge (0-shot) |
41.7 |
English language
Category |
Benchmark |
Velvet-2B |
General |
MMLU (5-shot) |
43.4 |
Instruction Following |
IFEval (0-shot) |
53.2 |
Commonsense |
Hellaswag (10-shot) |
65.0 |
|
WinoGrande (0-shot) |
60.9 |
Reasoning |
ARC-Challenge (25-shot) |
50.6 |
Usage
The model can be used with the following frameworks:
Responsibility and Safety
Safety
For our instruction-trained model, we have conducted comprehensive exercises, engaged in adversarial internal and external evaluations, and implemented mitigation techniques to reduce risks. These exercises were designed to thoroughly examine the model's limitations and potential, simulating real and hypothetical scenarios where undesirable behavior might occur.
However, despite these efforts, it is inevitable that some residual hazards will exist, as every large language model presents intrinsic complexities that cannot be completely eliminated.
Developers are advised to implement suitable safety measures and exercise due diligence, tailoring these safeguards to align with their product policies and the specific requirements of their applications.
Some trade-offs between model helpfulness and alignment are likely inevitable. Developers should thoughtfully balance the benefits of alignment and helpfulness for their specific applications and audiences. They must also remain aware of residual risks when using Velvet models and leverage additional safety tools as necessary to achieve an appropriate safety standard for their use case.
We advise developers to carefully evaluate risks in the context of their specific use case. They should consider the potential implications of a model failure in their applications and put adequate measures in place to manage such eventualities.
In parallel, we are collaborating with the scientific and industrial community to establish AI safety benchmark standards that are transparent, rigorous, and interpretable. The goal is to promote a better understanding of the risks associated with large language models and support the development of safer and more responsible solutions.
Governance and Internal Oversight
Almawave has established an internal governance framework for the management and continuous oversight of the Velvet model family. Key governance elements include:
- Supervision by an Ethical and Technical Committee to ensure the model aligns with principles of transparency, fairness, and safety.
- Ongoing bias monitoring through auditing tools, with iterative updates to improve alignment with ethical guidelines.
- Restrictions on commercial and institutional usage to ensure compliance with regulatory frameworks and shared responsibility principles.
- Periodic review processes to assess the model’s impact in high-risk applications.
Bias, Risks, and Limitations
Velvet has been trained on a dataset that, despite all the data curation efforts, might include toxic language and societal biases. This means that models in the Velvet family may reproduce these biases and produce harmful responses when prompted with such inputs. This is a common issue in AI models trained on large datasets, as they can inadvertently perpetuate the biases present in the data.
Furthermore, the model may generate inaccurate, incomplete, or redundant responses, which could be socially unacceptable or undesirable, even if the input prompt is not explicitly offensive. This is a potential flaw in the model's design and training process, and it underscores the importance of careful validation and monitoring of AI systems to ensure that they are functioning as intended.
Additionally, using the recommended prompt template is crucial to mitigate the risk of harmful responses, as it is designed to guide the model towards more appropriate and safe outputs. However, it is important to note that the model's performance may still vary depending on the specific context and complexity of the input prompt.
Finally, when using this model in an agentic workflow, it is essential to validate that all imported packages and dependencies are from trusted sources to ensure the model's security and integrity. This is a critical step in maintaining the model's ethical and responsible use, and it is important to prioritize end-to-end security measures to prevent any potential vulnerabilities or breaches.
Future versions of Velvet will integrate automated red-teaming protocols, continuously stress-testing the model against adversarial prompts to identify and mitigate emerging risks.
Sensitive Data Handling and Usage Restrictions
The Velvet model has not been trained on unauthorized personal data and must not be used to process sensitive data without appropriate security measures.
Usage Restrictions:
- Prohibited use on sensitive healthcare, financial, or government data without specific safeguards.
- Mandatory human validation in scenarios where the model’s outputs could have legal or ethical consequences.
- High-risk applications (legal, medical, public governance) must implement content filtering and auditing techniques to ensure response quality and safety.
Ethical Considerations
Almawave's core values are openness, inclusivity, and helpfulness. We aim to create AI that is accessible and beneficial for everyone, regardless of their background. Velvet models are designed to be respectful of diverse perspectives and avoid unnecessary judgments. Therefore, Velvet models are designed to be inclusive and respectful of diverse perspectives and needs. We strive to avoid unnecessary judgment or the imposition of normative views, recognizing that content deemed problematic in some contexts can have valuable applications in others.
We deeply respect the dignity and autonomy of all users, particularly their right to free thought and expression, which are fundamental to innovation and progress.
While we have taken significant steps to ensure the safety and reliability of Velvet models, it is important to acknowledge that they may occasionally generate inaccurate, biased, or unsafe responses.
Almawave is actively engaging with ethics committees and domain experts to ensure continuous oversight of Velvet’s outputs, improving safeguards through community feedback.
We strongly encourage the community to exercise caution and conduct thorough safety testing and fine-tuning when using Velvet models for specific tasks.
Opinions expressed by Velvet depend on training data and do not reflect any opinions of Almawave.
Contributions
- Direction: Raniero Romagnoli
- Model engineering and training: David Alessandrini, Francesco Buciuni, Andrea Favalli, Diego Perna, David Preti, Federico Wolenski, Fabio Massimo Zanzotto
- Data engineering and management: Valentina Bellomaria, Cristina Giannone, Alfredo Serafini
- Use case adaptation and testing: Salvatore Ricciardi, Simone Scaboro, Beatrice Turano, Giancarlo Xompero
- Evaluation: Giovanni Cingolani, Silvana De Benedictis, Caterina Masotti, Riccardo Pasquini, Guillaume Ruiz, Giuseppe Scrugli, Alessandro Vizzarro
- Product and governance: Beata Dobrzynska, Matteo Amore, Marco Gennaro Di Martino, Vincenzo Sciacca, Alessandra Staglianò, Luca Vinciguerra
📄 License
Velvet-2B is made available under the Apache 2.0 license.