Model Overview
Model Features
Model Capabilities
Use Cases
đ Llama-3.1-NemoGuard-8B-Topic-Control
Llama-3.1-NemoGuard-8B-Topic-Control is a model designed for topical and dialogue moderation in human-assistant interactions. It can be used in task-oriented dialogue agents and custom policy-based moderation, ensuring user prompts align with specified rules. This model is ready for commercial use.
đ Quick Start
You can try out the model here: Llama-3.1-NemoGuard-8B-Topic-Control
⨠Features
- Input Moderation: Ensures user prompts are consistent with system prompt rules.
- Customizable Rules: Allows for specifying allowed and disallowed topics, personas, and conversation boundaries.
- Binary Output: Returns a clear "on-topic" or "off-topic" response.
- Commercial Use: Ready for commercial applications.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
The prompt template consists of two key sections: system instruction and conversation history.
System Instruction
The system instruction part of the prompt serves as a comprehensive guideline to steer the conversation. It includes core rules and persona assignment.
If any of the above conditions are violated, please respond with "off-topic". Otherwise, respond with "on-topic". You must respond with "on-topic" or "off-topic".
Conversation History
The conversation history maintains a sequential record of user prompts and LLM responses.
[
{
"role": "system",
"content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations"
},
{
"role": "user",
"content": "Hi there!"
},
{
"role": "assistant",
"content": "Hello! How can I help today?"
},
{
"role": "user",
"content": "Do you know which is the most popular beach in Barcelona?"
}
]
Advanced Usage
Integrating with NeMo Guardrails:
To integrate the topic control model with NeMo Guardrails, create a config.yml
file similar to the following example:
models:
- type: main
engine: openai
model: gpt-3.5-turbo-instruct
- type: "topic_control"
engine: nim
parameters:
base_url: "http://localhost:8000/v1"
model_name: "llama-3.1-nemoguard-8b-topic-control"
rails:
input:
flows:
- topic safety check input $model=topic_control
đ Documentation
Model Overview
The base large language model (LLM) is the multilingual Llama-3.1-8B-Instruct model from Meta. Llama-3.1-TopicGuard is LoRa-tuned on a topic-following dataset generated synthetically with Mixtral-8x7B-Instruct-v0.1.
License/Terms of Use
Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama.
Reference(s)
Related paper:
@article{sreedhar2024canttalkaboutthis,
title={CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues},
author={Sreedhar, Makesh Narsimhan and Rebedea, Traian and Ghosh, Shaona and Zeng, Jiaqi and Parisien, Christopher},
journal={arXiv preprint arXiv:2404.03820},
year={2024}
}
Model Architecture
- Architecture Type: Transformer
- Network Architecture: Based on the Llama-3.1-8B-Instruct model from Meta (Model Card). Parameter Efficient FineTuning (PEFT) is performed with the following parameters:
- Rank: 8
- Alpha: 32
- Targeted low rank adaptation modules: 'k_proj', 'q_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj'.
- Training Method: Involves using a system instruction, a synthetic generated dataset, and instruction-tuning the base model to detect on-topic or off-topic user messages.
Input
- Input Type(s): Text
- Input Format(s): String
- Input Parameters: 1D (One-Dimensional) List: System prompt with topical instructions, followed by a conversation structured as a list of user and assistant messages.
- Other Properties Related to Input: The conversation should end with a user message for topical moderation. The input format respects the (OpenAI Chat specification)[https://platform.openai.com/docs/guides/text-generation].
Output
- Output Type(s): Text
- Output Format: String
- Output Parameters: 1D (One-Dimensional)
- Other Properties Related to Output: The response is a binary string label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic".
Software Integration
- Runtime Engine(s): PyTorch
- Libraries: Meta's llama-recipes, HuggingFace transformers library, HuggingFace peft library
- Supported Hardware Platform(s): NVIDIA Ampere (A100 80GB, A100 40GB)
- Preferred/Supported Operating System(s): Linux (Ubuntu)
Model Version(s)
Llama-3.1-TopicGuard
Training, Testing, and Evaluation Datasets
Training Dataset
- Link: CantTalkABoutThis dataset
- Data Collection Method by dataset: Synthetic
- Labeling Method by dataset: Synthetic
- Properties: Contains 1080 multi-turn conversations that are on-topic using 540 different topical instructions from various domains. For each on-topic conversation, off-topic/distractor turns are generated.
Testing Dataset
- Link: CantTalkABoutThis topic-following dataset
- Data Collection Method by dataset: Hybrid: Synthetic, Human
- Labeling Method by dataset: Hybrid: Synthetic, Human
- Properties: A smaller, human-annotated subset of the synthetically created test set. The test set contains conversations on a different domain (banking).
Evaluation Dataset
- Link: CantTalkABoutThis evaluation set
- Data Collection Method by dataset: Synthetic
- Labeling Method by dataset: Synthetic
- Properties: Contains 20 multi-turn conversations on 10 different scenarios in the travel domain.
Inference
- Engine: TRT-LLM/vLLM/Hugging Face
- Test Hardware: A100 80GB
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility. Developers should ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Report security vulnerabilities or NVIDIA AI Concerns here.
Explainability
Field | Response |
---|---|
Intended Application & Domain | Dialogue Agents and Guardrails |
Model Type | Transformer |
Intended Users | Developers building task-oriented dialogue assistants who want to specify the dialogue policy in natural language. Also useful as a topical guardrail in NeMo Guardrails. |
Output | Text - Binary label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic". |
Describe how the model works | The model receives the dialogue policy and the current conversation ending with the last user turn in the prompt of a LLM (Llama3.1-8B-Instruct). A binary decision is returned, specifying whether the input is on-topic or not. |
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | Not Applicable |
Technical Limitations | The model was trained on 9 domains. Strong generalization in other domains is suggested, but thorough testing is recommended for out-of-domain prompts. |
Verified to have met prescribed NVIDIA quality standards | Yes |
Performance Metrics | F1, Accuracy |
Potential Known Risks | Potential risks include the dialogue agent engaging in user content that is not on-topic. |
Licensing | Governing NVIDIA Download Terms & Third-Party Component Attribution Terms (Hugging Face LORA weights) GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. |
Bias
Field | Response |
---|---|
Participation considerations from adversely impacted groups protected classes in model design and testing | Not Applicable |
Measures taken to mitigate against unwanted bias | None |
Safety & Security
Field | Response |
---|---|
Model Application(s) | Dialogue agents for topic / dialogue moderation |
Describe the life critical impact (if present) | Not Applicable |
Use Case Restrictions | Should not be used for any use case other than text-based topic and dialogue moderation in task oriented dialogue agents. |
Model and dataset restrictions | Abide by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama. |
Privacy
Field | Response |
---|---|
Generatable or reverse engineerable personal data? | None |
Personal data used to create this model? | None |
Was consent obtained for any personal data used? | Not Applicable |
How often is dataset reviewed? | Before Every Release |
Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable |
If personal data was collected for the development of the model, was it coll | Not Applicable |
đ§ Technical Details
The model is based on the Llama-3.1-8B-Instruct model from Meta and uses Parameter Efficient FineTuning (PEFT) with specific network architecture parameters. It is trained on a synthetic dataset and tested on a human-annotated subset. The model's performance is evaluated using F1 and accuracy metrics.
đ License
Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama.

