đ Model Card for OpenLRM V1.1
This model card provides detailed information about the OpenLRM V1.1 project, an open - source implementation of the LRM paper.
đ Quick Start
This model card is dedicated to the OpenLRM project, an open - source implementation of the paper LRM. The information presented here corresponds to Version 1.1.
⨠Features
- Image - to - 3D Pipeline: It belongs to the image - to - 3D pipeline, which can transform images into 3D models.
- Multiple Model Variants: There are different model variants such as small, base, and large, trained on various datasets.
đ Documentation
Model Details
Training data
Property |
Details |
[openlrm - obj - small - 1.1](https://huggingface.co/zxhezexin/openlrm - obj - small - 1.1) |
Objaverse |
[openlrm - obj - base - 1.1](https://huggingface.co/zxhezexin/openlrm - obj - base - 1.1) |
Objaverse |
[openlrm - obj - large - 1.1](https://huggingface.co/zxhezexin/openlrm - obj - large - 1.1) |
Objaverse |
[openlrm - mix - small - 1.1](https://huggingface.co/zxhezexin/openlrm - mix - small - 1.1) |
Objaverse + MVImgNet |
[openlrm - mix - base - 1.1](https://huggingface.co/zxhezexin/openlrm - mix - base - 1.1) |
Objaverse + MVImgNet |
[openlrm - mix - large - 1.1](https://huggingface.co/zxhezexin/openlrm - mix - large - 1.1) |
Objaverse + MVImgNet |
Model architecture (version==1.1)
Property |
Details |
Model Type (small) |
Layers: 12, Feat. Dim: 512, Attn. Heads: 8, Triplane Dim.: 32, Input Res.: 224, Image Encoder: dinov2_vits14_reg, Size: 446M |
Model Type (base) |
Layers: 12, Feat. Dim: 768, Attn. Heads: 12, Triplane Dim.: 48, Input Res.: 336, Image Encoder: dinov2_vitb14_reg, Size: 1.04G |
Model Type (large) |
Layers: 16, Feat. Dim: 1024, Attn. Heads: 16, Triplane Dim.: 80, Input Res.: 448, Image Encoder: dinov2_vitb14_reg, Size: 1.81G |
Training settings
Property |
Details |
Model Type (small) |
Rend. Res.: 192, Rend. Patch: 64, Ray Samples: 96 |
Model Type (base) |
Rend. Res.: 288, Rend. Patch: 96, Ray Samples: 96 |
Model Type (large) |
Rend. Res.: 384, Rend. Patch: 128, Ray Samples: 128 |
Notable Differences from the Original Paper
- We do not use the deferred back - propagation technique in the original paper.
- We used random background colors during training.
- The image encoder is based on the DINOv2 model with register tokens.
- The triplane decoder contains 4 layers in our implementation.
đ License
Disclaimer
This model is an open - source implementation and is NOT the official release of the original research paper. While it aims to reproduce the original results as faithfully as possible, there may be variations due to model implementation, training data, and other factors.
Ethical Considerations
â ī¸ Important Note
This model should be used responsibly and ethically, and should not be used for malicious purposes. Users should be aware of potential biases in the training data, and the model should not be used under the circumstances that could lead to harm or unfair treatment of individuals or groups.
Usage Considerations
đĄ Usage Tip
The model is provided "as is" without warranty of any kind. Users are responsible for ensuring that their use complies with all relevant laws and regulations. The developers and contributors of this model are not liable for any damages or losses arising from the use of this model.
This model card is subject to updates and modifications. Users are advised to check for the latest version regularly.