🚀 CodeBERT-base-mlm
CodeBERT-base-mlm 提供了預訓練權重,用於處理編程和自然語言相關任務,其基於預訓練模型 CodeBERT,能在代碼理解等方面發揮重要作用。
🚀 快速開始
本模型是 CodeBERT: A Pre-Trained Model for Programming and Natural Languages 的預訓練權重。以下是使用示例:
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
model = RobertaForMaskedLM.from_pretrained('microsoft/codebert-base-mlm')
tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')
code_example = "if (x is not None) <mask> (x>1)"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
outputs = fill_mask(code_example)
print(outputs)
預期結果:
{'sequence': '<s> if (x is not None) and (x>1)</s>', 'score': 0.6049249172210693, 'token': 8}
{'sequence': '<s> if (x is not None) or (x>1)</s>', 'score': 0.30680200457572937, 'token': 50}
{'sequence': '<s> if (x is not None) if (x>1)</s>', 'score': 0.02133703976869583, 'token': 114}
{'sequence': '<s> if (x is not None) then (x>1)</s>', 'score': 0.018607674166560173, 'token': 172}
{'sequence': '<s> if (x is not None) AND (x>1)</s>', 'score': 0.007619690150022507, 'token': 4248}
📦 安裝指南
文檔未提及具體安裝步驟,你可以參考 transformers
庫的官方文檔進行安裝。
💻 使用示例
基礎用法
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
model = RobertaForMaskedLM.from_pretrained('microsoft/codebert-base-mlm')
tokenizer = RobertaTokenizer.from_pretrained('microsoft/codebert-base-mlm')
code_example = "if (x is not None) <mask> (x>1)"
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
outputs = fill_mask(code_example)
print(outputs)
🔧 技術細節
訓練數據
該模型在 CodeSearchNet 的代碼語料庫上進行訓練。
訓練目標
此模型以 Roberta-base 為初始模型,並使用簡單的 MLM(掩碼語言模型)目標進行訓練。
📚 詳細文檔
參考資料
- 使用 MLM+RTD 目標訓練的雙峰 CodeBERT(適用於代碼搜索和文檔生成)
- 🤗 Hugging Face 的 CodeBERTa(小尺寸,6 層)
引用格式
@misc{feng2020codebert,
title={CodeBERT: A Pre-Trained Model for Programming and Natural Languages},
author={Zhangyin Feng and Daya Guo and Duyu Tang and Nan Duan and Xiaocheng Feng and Ming Gong and Linjun Shou and Bing Qin and Ting Liu and Daxin Jiang and Ming Zhou},
year={2020},
eprint={2002.08155},
archivePrefix={arXiv},
primaryClass={cs.CL}
}