Velvet-2B Open-source Bilingual Large Language Model - Freely Achieve Generation and Understanding of Chinese and English Texts

Velvet 2B

Developed by Almawave

Velvet-2B is a 2-billion-parameter Italian-English bilingual large language model, trained from scratch based on the Transformer architecture, focusing on text generation and comprehension tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Italian language optimization #Multi-task instruction fine-tuning #32K long text support

Downloads 3,784

Release Time : 2/10/2025

Model Overview

Velvet-2B is the 2-billion-parameter version in the Velvet series, optimized for Italian and English. The model adopts a dense architecture design and supports various natural language processing tasks, including text generation, classification, summarization, and question answering.

Model Features

Bilingual optimization

Specially balanced training for Italian and English, effectively reducing overfitting bias

Extended context window

Supports 4K token context window, extendable to 32K, suitable for processing long documents

Multi-task capability

Fine-tuned with instructions to perform various natural language processing tasks

Efficient inference

Adopts grouped query attention (GQA) mechanism to improve inference efficiency

Model Capabilities

Text summarization

Information extraction

Retrieval-augmented generation (RAG)

Text paraphrasing

Text entailment

Natural language inference

Common sense reasoning

Text classification

Machine translation

Question answering systems

Text completion

Use Cases

Content generation

Article writing

Generate news, blogs, and other content in Italian or English

Can produce fluent and coherent text

Product descriptions

Generate product descriptions for e-commerce platforms

Can generate professional descriptions tailored to product features

Information processing

Document summarization

Extract key information and summarize long documents

Can generate summaries that accurately reflect the main points of the original text

Question answering systems

Build knowledge-based question answering applications

Can provide accurate and relevant answers

Language services

Machine translation

Bidirectional translation between Italian and English

Can provide fluent translation results

Text rewriting

Paraphrase existing text with synonyms

Can maintain the original meaning while changing the expression

🚀 Model Card for Velvet-2B

Velvet is a family of Italian large language models developed from scratch with a dense architecture. This model was trained on the HPC Leonardo infrastructure hosted by CINECA, using publicly available data that has been extensively curated.

The training of the Velvet family started with over 10 trillion tokens in 6 languages (Italian, English, Spanish, Portuguese-Brazilian, German, French). Velvet-2B has been trained on nearly 3 trillion tokens across two languages (Italian, English).

✨ Features

Developed from scratch with a dense architecture.
Trained on a large amount of curated public data.
Available in two sizes: 2B and 14B parameters.
Supports multiple languages, including Italian and English.

📦 Installation

Not provided in the original README, so this section is skipped.

💻 Usage Examples

Not provided in the original README, so this section is skipped.

📚 Documentation

Model details

Model Developers: Technology and innovation Team, Almawave
Input: The models accept only text input.
Output: The models generate only text output.
Release Date: February 11th, 2025.
License: Apache 2.0

Model Architecture and training

The Velvet family of models comes in two sizes - 2B and 14B parameters - namely, Velvet-2B and Velvet-14B. Velvet-2B is a 2B parameter instruct model fine-tuned from Velvet-2B-base using a combination of open-source instruction datasets with permissive licenses and internally collected synthetic datasets tailored for solving textual "instruction-based" problems.

Architecture

Auto-regressive language model with a transformer-based causal decoder-only design.
28 transformer layers.
MLP intermediate size of 8,192.
Grouped Query Attention (GQA): 32 query heads and 8 key-value heads for efficiency.
Rotary Position Embedding (RoPE).
SiLU activation function with RMSNorm method.
Trained on sequences of 4K tokens, supports context length up to 32K tokens.
127K vocabulary size, designed to accommodate language diversity.
Training phase: pretraining & post-training

Status

This is a static model trained on an offline dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. Almawave is actively working on strategies to enhance alignment and robustness in future iterations of the Velvet model.

License

Velvet-2B is made available under the Apache 2.0 license.

Supported Languages

Velvet-2B has been trained on Italian and English. To ensure high-quality multilingual performance, the dataset was curated to balance linguistic representation, reducing overfitting biases.

Intended Use

Velvet-2B is designed to be integrated into AI systems or applications. Its potential uses include, but are not limited to, text generation, classification, summarization, and question answering. It is important to note that specific applications may require further model adaptations or additional safeguards to prevent undesirable behavior or outputs.

Capabilities

Summarization
Information Extraction
RAG (Retrieval Augmented Generation)
Paraphrasing
Textual Entailment
Natural Language Inference
Common Sense Reasoning
Text Classification
Machine Translation
Question Answering
Text Completion

Training Data

Overview

The model was pre-trained on nearly 3 trillion tokens of data from publicly available sources. These sources include a diverse collection of web text, exposing the model to a wide range of linguistic styles, topics, and vocabulary. The training dataset was built with a balanced representation of multiple languages.

The fine-tuning data includes publicly available instruction datasets, as well as over 1M human-annotated and synthetic examples for SFT. Moreover, we used over 50k human-generated examples for safety instructions. Neither the pre-training nor the fine-tuning datasets include Almawave's customer data.

We have made significant efforts to enhance the reliability of responses in terms of factual accuracy; however, we always recommend grounding LLM responses with external factual data (e.g., Retrieval Augmented Generation).

Data Freshness

The pre-training data has a cutoff between August 2024 and October 2024 for the two different models.

Evaluation

Italian language

Category	Benchmark	Velvet-2B
General	MMLU (5-shot)	39.6
Commonsense	Hellaswag (0-shot)	54.3
	WinoGrande ITA-bench (0-shot)	61.9
	PIQA ITA-bench (0-shot)	67.3
	SciQ ITA-bench (0-shot) with p.	86.6
Reasoning	ARC-Challenge (0-shot)	41.7

English language

Category	Benchmark	Velvet-2B
General	MMLU (5-shot)	43.4
Instruction Following	IFEval (0-shot)	53.2
Commonsense	Hellaswag (10-shot)	65.0
	WinoGrande (0-shot)	60.9
Reasoning	ARC-Challenge (25-shot)	50.6

Usage

The model can be used with the following frameworks:

Responsibility and Safety

Safety

For our instruction-trained model, we have conducted comprehensive exercises, engaged in adversarial internal and external evaluations, and implemented mitigation techniques to reduce risks. These exercises were designed to thoroughly examine the model's limitations and potential, simulating real and hypothetical scenarios where undesirable behavior might occur.

However, despite these efforts, it is inevitable that some residual hazards will exist, as every large language model presents intrinsic complexities that cannot be completely eliminated.

Developers are advised to implement suitable safety measures and exercise due diligence, tailoring these safeguards to align with their product policies and the specific requirements of their applications.

Some trade-offs between model helpfulness and alignment are likely inevitable. Developers should thoughtfully balance the benefits of alignment and helpfulness for their specific applications and audiences. They must also remain aware of residual risks when using Velvet models and leverage additional safety tools as necessary to achieve an appropriate safety standard for their use case.

We advise developers to carefully evaluate risks in the context of their specific use case. They should consider the potential implications of a model failure in their applications and put adequate measures in place to manage such eventualities.

In parallel, we are collaborating with the scientific and industrial community to establish AI safety benchmark standards that are transparent, rigorous, and interpretable. The goal is to promote a better understanding of the risks associated with large language models and support the development of safer and more responsible solutions.

Governance and Internal Oversight

Almawave has established an internal governance framework for the management and continuous oversight of the Velvet model family. Key governance elements include:

Supervision by an Ethical and Technical Committee to ensure the model aligns with principles of transparency, fairness, and safety.
Ongoing bias monitoring through auditing tools, with iterative updates to improve alignment with ethical guidelines.
Restrictions on commercial and institutional usage to ensure compliance with regulatory frameworks and shared responsibility principles.
Periodic review processes to assess the model’s impact in high-risk applications.

Bias, Risks, and Limitations

Velvet has been trained on a dataset that, despite all the data curation efforts, might include toxic language and societal biases. This means that models in the Velvet family may reproduce these biases and produce harmful responses when prompted with such inputs. This is a common issue in AI models trained on large datasets, as they can inadvertently perpetuate the biases present in the data.

Furthermore, the model may generate inaccurate, incomplete, or redundant responses, which could be socially unacceptable or undesirable, even if the input prompt is not explicitly offensive. This is a potential flaw in the model's design and training process, and it underscores the importance of careful validation and monitoring of AI systems to ensure that they are functioning as intended.

Additionally, using the recommended prompt template is crucial to mitigate the risk of harmful responses, as it is designed to guide the model towards more appropriate and safe outputs. However, it is important to note that the model's performance may still vary depending on the specific context and complexity of the input prompt.

Finally, when using this model in an agentic workflow, it is essential to validate that all imported packages and dependencies are from trusted sources to ensure the model's security and integrity. This is a critical step in maintaining the model's ethical and responsible use, and it is important to prioritize end-to-end security measures to prevent any potential vulnerabilities or breaches.

Future versions of Velvet will integrate automated red-teaming protocols, continuously stress-testing the model against adversarial prompts to identify and mitigate emerging risks.

Sensitive Data Handling and Usage Restrictions

The Velvet model has not been trained on unauthorized personal data and must not be used to process sensitive data without appropriate security measures.

Usage Restrictions:

Prohibited use on sensitive healthcare, financial, or government data without specific safeguards.
Mandatory human validation in scenarios where the model’s outputs could have legal or ethical consequences.
High-risk applications (legal, medical, public governance) must implement content filtering and auditing techniques to ensure response quality and safety.

Ethical Considerations

Almawave's core values are openness, inclusivity, and helpfulness. We aim to create AI that is accessible and beneficial for everyone, regardless of their background. Velvet models are designed to be respectful of diverse perspectives and avoid unnecessary judgments. Therefore, Velvet models are designed to be inclusive and respectful of diverse perspectives and needs. We strive to avoid unnecessary judgment or the imposition of normative views, recognizing that content deemed problematic in some contexts can have valuable applications in others.

We deeply respect the dignity and autonomy of all users, particularly their right to free thought and expression, which are fundamental to innovation and progress.

While we have taken significant steps to ensure the safety and reliability of Velvet models, it is important to acknowledge that they may occasionally generate inaccurate, biased, or unsafe responses.

Almawave is actively engaging with ethics committees and domain experts to ensure continuous oversight of Velvet’s outputs, improving safeguards through community feedback.

We strongly encourage the community to exercise caution and conduct thorough safety testing and fine-tuning when using Velvet models for specific tasks.

Opinions expressed by Velvet depend on training data and do not reflect any opinions of Almawave.

Contributions

Direction: Raniero Romagnoli
Model engineering and training: David Alessandrini, Francesco Buciuni, Andrea Favalli, Diego Perna, David Preti, Federico Wolenski, Fabio Massimo Zanzotto
Data engineering and management: Valentina Bellomaria, Cristina Giannone, Alfredo Serafini
Use case adaptation and testing: Salvatore Ricciardi, Simone Scaboro, Beatrice Turano, Giancarlo Xompero
Evaluation: Giovanni Cingolani, Silvana De Benedictis, Caterina Masotti, Riccardo Pasquini, Guillaume Ruiz, Giuseppe Scrugli, Alessandro Vizzarro
Product and governance: Beata Dobrzynska, Matteo Amore, Marco Gennaro Di Martino, Vincenzo Sciacca, Alessandra Staglianò, Luca Vinciguerra

📄 License

Velvet-2B is made available under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご