BART-Lagrangian Open-Source Model - Free Deployment, Focused on Generating Lagrangians in Particle Physics

BART Lagrangian

Developed by JoseEliel

A sequence-to-sequence Transformer model based on the BART architecture, specifically designed for generating Lagrangians in particle physics.

Physics Model

Transformers

Open Source License:Gpl-3.0 #Symbolic Physics Generation #Gauge Symmetry Modeling #Specialized for Particle Physics

Downloads 61

Release Time : 1/13/2025

Model Overview

This model generates Lagrangians for particle physics based on textual descriptions of fields, spin, and gauge symmetries, focusing on the symbolic structure of physics.

Model Features

Specialized Physical Symbol Generation

Focuses on generating Lagrangian symbolic expressions that comply with physical rules

Custom Tokenization Scheme

Uses special tokens to capture physical properties such as quantum numbers and gauge symmetries of fields

Large-Scale Physics Data Training

Specifically trained on extensive symbolic physics data

Model Capabilities

Physical symbol generation

Lagrangian construction

Gauge symmetry handling

Use Cases

Physics Research

Particle Physics Model Construction

Automatically generates possible Lagrangians based on fundamental field properties

Produces valid physical models that comply with gauge symmetries

Theoretical Physics Education

Demonstrates how different field combinations form valid Lagrangians

Helps students understand symmetries and interactions in field theory

🚀 BART-Lagrangian

BART-Lagrangian is a specialized sequence-to-sequence Transformer. It can generate particle physics Lagrangians from textual descriptions, offering a new approach for symbolic physics research.

🚀 Quick Start

Installation

You can use BART-Lagrangian directly with the Hugging Face Transformers library. Install the prerequisites (for example with pip):

pip install transformers torch

Load the Model and Tokenizer

from transformers import BartForConditionalGeneration, PreTrainedTokenizerFast

model_name = "JoseEliel/BART-Lagrangian"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_name)

Prepare Input

Below is a simple example describing two fields, a scalar with charges (1, 2, 1) and a fermion with -1/2 Helicity with charges (3, 2, 1/3) under (SU(3), SU(2), U(1)):

input_text = "[SOS] FIELD SPIN 0 SU2 2 U1 1 FIELD SPIN 1 / 2 SU3 3 SU2 2 U1 1 / 3 HEL - 1 / 2 [EOS]"

Perform Generation

inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs['input_ids'], max_length=2048)
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print("Generated Lagrangian:")
print(decoded_outputs[0])

✨ Features

BART Architecture: BART-Lagrangian is based on the BART architecture with sequence-to-sequence pretraining, enabling effective sequence generation.
Custom Tokenization: It uses a custom tokenization scheme that captures field quantum numbers and contractions, facilitating the handling of physics symbols.
Specialized Training: The model is trained on a large corpus of symbolic physics data, making it suitable for symbolic physics research.

📚 Documentation

Model Summary

BART-Lagrangian is a sequence-to-sequence Transformer (BART-based) specifically trained to generate particle physics Lagrangians from textual descriptions of fields, spins, and gauge symmetries. Unlike typical language models, BART-Lagrangian focuses on the symbolic structure of physics, aiming to produce coherent and accurate Lagrangian terms given customized tokens representing field types, spins, helicities, gauge groups (SU(3), SU(2), U(1)), and more.

Evaluation

BART-Lagrangian has been evaluated on both internal test sets of symbolic Lagrangians to measure consistency and correctness, and human inspection by domain experts to confirm the generated Lagrangian terms align with expected physics rules (e.g., correct gauge symmetries, valid contractions).

Limitations

Domain Specificity: BART-Lagrangian is specialized for Lagrangian generation; it may not perform well on unrelated language tasks.
Input Format Sensitivity: The model relies on a specific tokenized format for fields and symmetries. Incorrect or incomplete tokenization can yield suboptimal or invalid outputs.
Potential Redundancy: Some generated Lagrangians can contain redundant terms, as non-redundant operator filtering was beyond the scope of initial training.
Context Length Limit: The default generation max_length is 2048 tokens, which may be insufficient for extremely large or highly complex expansions.

Training

Architecture: BART, sequence-to-sequence Transformer with approximately 357M parameters.
Data: A large corpus of synthetically generated Lagrangians using a custom pipeline (AutoEFT + additional code).
Objective: Conditioned generation of invariant terms given field tokens, spins, and gauge group embeddings.
Hardware: Trained on an A100 GPU, leveraging standard PyTorch and Transformers libraries.

🔧 Technical Details

For more technical details, see the forthcoming paper “Generating Particle Physics Lagrangians with Transformers” (arXiv link placeholder).

📄 License

The model, code, and weights are provided under the AGPL-3.0 license.

Acknowledgements

The computational work for this research was primarily supported by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), funded by the Swedish Research Council (Grant No. 2022-06725). Additional computing resources were provided by Google Cloud Platform via research credits awarded to the project.

Citation

If you use BART-Lagrangian in your work, please cite it as follows:

@misc{BARTLagrangian,
      title={Generating particle physics Lagrangians with transformers}, 
      author={Yong Sheng Koay and Rikard Enberg and Stefano Moretti and Eliel Camargo-Molina},
      year={2025},
      eprint={2501.09729},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2501.09729}, 
}

📦 Information Table

Property	Details
Model Type	Sequence-to-sequence Transformer (BART-based)
Training Data	A large corpus of synthetically generated Lagrangians using a custom pipeline (AutoEFT + additional code)
Pipeline Tag	text2text-generation
Tags	Physics, Math, Lagrangian
Library Name	transformers
License	AGPL-3.0
Datasets	JoseEliel/lagrangian_generation

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご