đ JavaBERT Model Card
JavaBERT is a BERT - like model pretrained on Java software code, offering Fill - Mask capabilities.
đ Quick Start
Use the code below to get started with the model.
Click to expand
from transformers import pipeline
pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
output = pipe(CODE)
⨠Features
- Pretrained on Java software code.
- Capable of Fill - Mask tasks.
đĻ Installation
No installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import pipeline
pipe = pipeline('fill-mask', model='CAUKiel/JavaBERT')
output = pipe(CODE)
đ Documentation
Model Details
Model Description
A BERT - like model pretrained on Java software code.
- Developed by: Christian - Albrechts - University of Kiel (CAUKiel)
- Shared by [Optional]: Hugging Face
- Model type: Fill - Mask
- Language(s) (NLP): en
- License: Apache - 2.0
- Related Models: A version of this model using an uncased tokenizer is available at [CAUKiel/JavaBERT - uncased](https://huggingface.co/CAUKiel/JavaBERT - uncased).
- Resources for more information:
Uses
Direct Use
Fill - Mask
Out - of - Scope Use
The model should not be used to intentionally create hostile or alienating environments for people.
Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl - long.330.pdf) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Training Details
Training Data
The model was trained on 2,998,345 Java files retrieved from open source projects on GitHub. A bert - base - cased
tokenizer is used by this model.
Training Procedure
Training Objective
A MLM (Masked Language Model) objective was used to train this model.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Citation
BibTeX:
@inproceedings{De_Sousa_Hasselbring_2021,
address={Melbourne, Australia},
title={JavaBERT: Training a Transformer - Based Model for the Java Programming Language},
rights={https://ieeexplore.ieee.org/Xplorehelp/downloads/license - information/IEEE.html},
ISBN={9781665435833},
url={https://ieeexplore.ieee.org/document/9680322/},
DOI={10.1109/ASEW52652.2021.00028},
booktitle={2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW)},
publisher={IEEE},
author={Tavares de Sousa, Nelson and Hasselbring, Wilhelm},
year={2021},
month=nov,
pages={90â95} }
đ§ Technical Details
No detailed technical information is provided in the original document.
đ License
The model is licensed under the Apache - 2.0 license.