Vintix
V
Vintix
Developed by dunnolab
Vintix is a multi-task action model achieved through contextual reinforcement learning, demonstrating outstanding performance across multiple benchmarks.
Downloads 41
Release Time : 3/3/2025
Model Overview
Vintix is an action model based on contextual reinforcement learning, specifically designed for multi-task reinforcement learning scenarios, excelling on datasets including MuJoCo, the metaverse, bimanual dexterous manipulation, and industrial benchmarks.
Model Features
Multi-task Reinforcement Learning
Capable of handling multiple reinforcement learning tasks simultaneously, including physical simulations and industrial benchmark tests
High Performance
Achieves an exceptional IQM normalized score of 0.99 across multiple benchmarks
Large-scale Model
Boasts 332 million parameters and a 20-layer structure, providing robust learning capabilities
Model Capabilities
Physical environment simulation
Industrial task processing
Bimanual dexterous manipulation
Multi-task reinforcement learning
Contextual learning
Use Cases
Robot Control
MuJoCo Physical Simulation
Simulation for robot physical movement and environmental interaction
Achieves an IQM normalized score of 0.99
Bimanual Dexterous Manipulation
Coordinated bimanual manipulation tasks for robots
Achieves an IQM normalized score of 0.92
Industrial Applications
Industrial Benchmark Tests
Handling complex tasks in industrial environments
Achieves an IQM normalized score of 0.99
đ Vintix
A multi-task action model via in-context reinforcement learning, offering solutions for complex reinforcement learning tasks and demonstrating high performance across diverse datasets.
đ Quick Start
This is a multi-task action model via in-context reinforcement learning.
đ Documentation
Model Details
Property | Details |
---|---|
Parameters | 332M |
Model Sizes | Layers: 20, Heads: 16, Embedding Size: 1024 |
Sequence Length | 8192 |
Training Data | MuJoCo, Meta-World, Bi-DexHands, Industrial Benchmark |
Model Description
- Developed by: dunnolab
- License: Apache 2.0
Model Sources
- Repository: https://github.com/dunnolab/vintix
- Paper: https://arxiv.org/abs/2501.19400
Results
The model has been evaluated on various datasets for in-context reinforcement learning tasks. Here are the key metrics:
Dataset | Task Type | Metric | Value |
---|---|---|---|
MuJoCo | In-Context Reinforcement Learning | IQM Normalized 95 | 0.99 |
Meta-World | In-Context Reinforcement Learning | IQM Normalized 95 | 0.99 |
Bi-DexHands | In-Context Reinforcement Learning | IQM Normalized 95 | 0.92 |
Industrial-Benchmark | In-Context Reinforcement Learning | IQM Normalized 95 | 0.99 |
ant_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 6315.00 +/- 675.00 |
ant_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.10 |
halfcheetah_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 7226.50 +/- 241.50 |
halfcheetah_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.93 +/- 0.03 |
hopper_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 2794.60 +/- 612.62 |
hopper_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.86 +/- 0.19 |
humanoid_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 7376.26 +/- 0.00 |
humanoid_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.00 |
humanoidstandup_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 320567.82 +/- 58462.11 |
humanoidstandup_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.02 +/- 0.21 |
inverteddoublependulum_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 6105.75 +/- 4368.65 |
inverteddoublependulum_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.65 +/- 0.47 |
invertedpendulum_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 1000.00 +/- 0.00 |
invertedpendulum_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.00 |
pusher_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | -37.82 +/- 8.72 |
pusher_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.02 +/- 0.08 |
reacher_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | -6.25 +/- 2.63 |
reacher_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.07 |
swimmer_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 93.20 +/- 5.40 |
swimmer_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.06 |
walker2d_v4 (MuJoCo) | In-Context Reinforcement Learning | Total Reward | 5400.00 +/- 107.95 |
walker2d_v4 (MuJoCo) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.02 |
assembly-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 307.08 +/- 25.20 |
assembly-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.04 +/- 0.10 |
basketball-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 568.04 +/- 60.72 |
basketball-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.02 +/- 0.11 |
bin-picking-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 7.88 +/- 4.28 |
bin-picking-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.01 +/- 0.01 |
box-close-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 61.75 +/- 13.54 |
box-close-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | -0.04 +/- 0.03 |
button-press-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 624.67 +/- 42.77 |
button-press-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.07 |
button-press-topdown-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 449.36 +/- 62.16 |
button-press-topdown-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.94 +/- 0.14 |
button-press-topdown-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 482.08 +/- 32.48 |
button-press-topdown-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.07 |
button-press-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 672.00 +/- 26.48 |
button-press-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.04 |
coffee-button-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 719.00 +/- 41.10 |
coffee-button-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.06 |
coffee-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 26.04 +/- 56.12 |
coffee-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.07 +/- 0.20 |
coffee-push-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 571.01 +/- 112.28 |
coffee-push-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.20 |
dial-turn-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 783.90 +/- 53.17 |
dial-turn-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.07 |
disassemble-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 523.60 +/- 58.15 |
disassemble-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.12 |
door-close-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 538.10 +/- 25.76 |
door-close-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.02 +/- 0.05 |
door-lock-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 356.51 +/- 249.44 |
door-lock-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.35 +/- 0.36 |
door-open-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 581.33 +/- 26.33 |
door-open-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.05 |
door-unlock-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 352.86 +/- 147.78 |
door-unlock-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.21 +/- 0.26 |
drawer-close-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 838.88 +/- 7.41 |
drawer-close-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.96 +/- 0.01 |
drawer-open-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 493.00 +/- 3.57 |
drawer-open-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.01 |
faucet-close-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 749.46 +/- 14.83 |
faucet-close-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.03 |
faucet-open-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 732.47 +/- 15.23 |
faucet-open-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.03 |
hammer-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 669.31 +/- 69.56 |
hammer-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.12 |
hand-insert-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 142.81 +/- 146.64 |
hand-insert-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.19 +/- 0.20 |
handle-press-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 835.30 +/- 114.19 |
handle-press-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.15 |
handle-press-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 852.96 +/- 16.08 |
handle-press-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.02 |
handle-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 701.10 +/- 13.82 |
handle-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.02 |
handle-pull-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 493.10 +/- 53.65 |
handle-pull-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.11 |
lever-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 548.72 +/- 81.12 |
lever-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.96 +/- 0.16 |
peg-insert-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 352.43 +/- 137.24 |
peg-insert-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.40 |
peg-unplug-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 401.52 +/- 175.27 |
peg-unplug-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.75 +/- 0.34 |
pick-out-of-hole-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 364.20 +/- 79.56 |
pick-out-of-hole-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.91 +/- 0.20 |
pick-place-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 414.02 +/- 91.10 |
pick-place-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.22 |
pick-place-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 553.18 +/- 84.72 |
pick-place-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.04 +/- 0.16 |
plate-slide-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 531.98 +/- 156.94 |
plate-slide-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.34 |
plate-slide-back-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 703.93 +/- 108.27 |
plate-slide-back-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.16 |
plate-slide-back-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 721.29 +/- 62.15 |
plate-slide-back-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.09 |
plate-slide-side-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 578.24 +/- 143.73 |
plate-slide-side-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.83 +/- 0.22 |
push-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 729.33 +/- 104.40 |
push-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.14 |
push-back-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 372.16 +/- 112.75 |
push-back-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.29 |
push-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 741.68 +/- 14.84 |
push-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.02 |
reach-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 684.45 +/- 136.55 |
reach-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.26 |
reach-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 738.02 +/- 100.96 |
reach-wall-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.17 |
shelf-place-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 268.34 +/- 29.07 |
shelf-place-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.11 |
soccer-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 438.44 +/- 189.63 |
soccer-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.80 +/- 0.35 |
stick-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 483.98 +/- 83.25 |
stick-pull-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.92 +/- 0.16 |
stick-push-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 563.07 +/- 173.40 |
stick-push-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.90 +/- 0.28 |
sweep-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 487.19 +/- 60.02 |
sweep-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.94 +/- 0.12 |
sweep-into-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 798.80 +/- 15.62 |
sweep-into-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.02 |
window-close-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 562.48 +/- 91.17 |
window-close-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.17 |
window-open-v2 (Meta-World) | In-Context Reinforcement Learning | Total Reward | 573.69 +/- 93.98 |
window-open-v2 (Meta-World) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.96 +/- 0.17 |
shadowhandblockstack (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 347.40 +/- 50.60 |
shadowhandblockstack (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.17 +/- 0.23 |
shadowhandbottlecap (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 338.25 +/- 81.25 |
shadowhandbottlecap (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.81 +/- 0.25 |
shadowhandcatchabreast (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 11.81 +/- 21.28 |
shadowhandcatchabreast (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.17 +/- 0.32 |
shadowhandcatchover2underarm (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 31.60 +/- 7.20 |
shadowhandcatchover2underarm (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.92 +/- 0.24 |
shadowhandcatchunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 18.21 +/- 9.46 |
shadowhandcatchunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.72 +/- 0.39 |
shadowhanddoorcloseinward (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 3.97 +/- 0.15 |
shadowhanddoorcloseinward (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.36 +/- 0.02 |
shadowhanddoorcloseoutward (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 358.50 +/- 4.50 |
shadowhanddoorcloseoutward (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | -1.27 +/- 0.01 |
shadowhanddooropeninward (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 108.25 +/- 8.50 |
shadowhanddooropeninward (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.29 +/- 0.02 |
shadowhanddooropenoutward (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 83.65 +/- 12.10 |
shadowhanddooropenoutward (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.13 +/- 0.02 |
shadowhandgraspandplace (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 485.15 +/- 89.10 |
shadowhandgraspandplace (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.18 |
shadowhandkettle (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | -450.47 +/- 0.00 |
shadowhandkettle (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | -0.99 +/- 0.00 |
shadowhandliftunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 377.92 +/- 13.24 |
shadowhandliftunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.03 |
shadowhandover (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 33.01 +/- 0.96 |
shadowhandover (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.03 |
shadowhandpen (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 98.80 +/- 83.60 |
shadowhandpen (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.52 +/- 0.44 |
shadowhandpushblock (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 445.60 +/- 2.20 |
shadowhandpushblock (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.01 |
shadowhandreorientation (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 2798.00 +/- 2112.00 |
shadowhandreorientation (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.89 +/- 0.66 |
shadowhandscissors (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 747.95 +/- 7.65 |
shadowhandscissors (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.03 +/- 0.01 |
shadowhandswingcup (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 3775.50 +/- 583.70 |
shadowhandswingcup (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.13 |
shadowhandswitch (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 268.25 +/- 2.35 |
shadowhandswitch (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.01 |
shadowhandtwocatchunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Total Reward | 2.17 +/- 0.67 |
shadowhandtwocatchunderarm (Bi-DexHands) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.03 +/- 0.03 |
industrial-benchmark-0-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -191.39 +/- 22.96 |
industrial-benchmark-0-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.94 +/- 0.13 |
industrial-benchmark-5-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -194.01 +/- 3.66 |
industrial-benchmark-5-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.02 |
industrial-benchmark-10-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -213.28 +/- 2.01 |
industrial-benchmark-10-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.01 |
industrial-benchmark-15-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -227.82 +/- 4.29 |
industrial-benchmark-15-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.01 +/- 0.02 |
industrial-benchmark-20-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -259.99 +/- 22.70 |
industrial-benchmark-20-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.11 |
industrial-benchmark-25-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -282.28 +/- 20.70 |
industrial-benchmark-25-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.11 |
industrial-benchmark-30-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -307.02 +/- 19.23 |
industrial-benchmark-30-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.90 +/- 0.10 |
industrial-benchmark-35-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -314.36 +/- 5.62 |
industrial-benchmark-35-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 1.00 +/- 0.03 |
industrial-benchmark-40-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -339.34 +/- 9.57 |
industrial-benchmark-40-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.05 |
industrial-benchmark-45-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -366.63 +/- 7.47 |
industrial-benchmark-45-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.04 |
industrial-benchmark-50-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -395.94 +/- 17.65 |
industrial-benchmark-50-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.91 +/- 0.09 |
industrial-benchmark-55-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -403.73 +/- 2.03 |
industrial-benchmark-55-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.01 |
industrial-benchmark-60-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -434.25 +/- 4.12 |
industrial-benchmark-60-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.02 |
industrial-benchmark-65-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -480.31 +/- 8.63 |
industrial-benchmark-65-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.86 +/- 0.04 |
industrial-benchmark-70-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -480.76 +/- 5.98 |
industrial-benchmark-70-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.95 +/- 0.03 |
industrial-benchmark-75-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -476.83 +/- 2.44 |
industrial-benchmark-75-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.99 +/- 0.01 |
industrial-benchmark-80-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -497.13 +/- 2.95 |
industrial-benchmark-80-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.96 +/- 0.01 |
industrial-benchmark-85-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -513.83 +/- 3.06 |
industrial-benchmark-85-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.98 +/- 0.01 |
industrial-benchmark-90-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -532.70 +/- 3.61 |
industrial-benchmark-90-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.01 |
industrial-benchmark-95-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -557.42 +/- 3.81 |
industrial-benchmark-95-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.01 |
industrial-benchmark-100-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Total Reward | -574.57 +/- 4.37 |
industrial-benchmark-100-v1 (Industrial-Benchmark) | In-Context Reinforcement Learning | Expert Normalized Total Reward | 0.97 +/- 0.01 |
đ License
This model is licensed under the Apache 2.0 license.
Citation
@article{polubarov2025vintix,
author={Andrey Polubarov and Nikita Lyubaykin and Alexander Derevyagin and Ilya Zisman and Denis Tarasov and Alexander Nikulin and Vladislav Kurenkov},
title={Vintix: Action Model via In-Context Reinforcement Learning},
journal={arXiv},
volume={2501.19400},
year={2025}
}
Decision Transformer Gym Hopper Medium
This is a decision transformer model trained on medium-performance trajectories in the Gym Hopper environment, suitable for continuous control tasks.
Physics Model
Transformers

D
edbeeching
6,518
6
Decision Transformer Gym Hopper Expert
This is a trained decision transformer model, with training data sourced from expert trajectories in the Gym Hopper environment.
Physics Model
Transformers

D
edbeeching
727
19
Ppo MountainCarContinuous V0
This is a reinforcement learning agent based on the PPO algorithm, specifically trained for the MountainCarContinuous-v0 environment, capable of effectively solving the continuous control problem of the mountain car.
Physics Model
P
sb3
433
1
Dqn Acrobot V1
This is a DQN reinforcement learning agent trained using the stable-baselines3 library, specifically designed to solve the Acrobot-v1 control problem.
Physics Model
D
sb3
403
0
Burgers Inverse
A deep learning model for solving the inverse problem of Burgers equation, capable of predicting velocity evolution and estimating physical parameters
Physics Model
TensorBoard English

B
piotrnobis
335
0
Lwm V1.1
LWM 1.1 is an upgraded pre-trained model specifically designed for wireless channel feature extraction, supporting diverse channel configurations to enhance feature extraction quality and generalization capabilities.
Physics Model
Transformers

L
wi-lab
277
1
Td3 MountainCarContinuous V0
A TD3 reinforcement learning agent trained based on the stable-baselines3 library, specifically designed for the MountainCarContinuous-v0 environment.
Physics Model
T
sb3
203
0
Lwm
LWM is the first foundational model in the field of wireless communications, developed as a universal feature extractor capable of extracting fine-grained representations from wireless channel data.
Physics Model
Transformers

L
wi-lab
137
3
Assignment2 Omar
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
A
Classroom-workshop
135
3
PPO LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically trained for the LunarLander-v2 environment to safely control the lunar lander.
Physics Model
P
BioGeek
102
0
Featured Recommended AI Models
Š 2025AIbase