đ A code generation T5 model for solidity (web3 smart contract)
This is a T5 model for Solidity code generation, specifically designed for web3 smart contracts.
- See https://github.com/hululuzhu/solidity-t5 for more context
đ Quick Start
⨠Features
- This model can generate Solidity code for web3 smart contracts.
- It takes into account the Solidity version, ancestor class/library information, and contract declaration header.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, T5ForConditionalGeneration
DEVICE = 'cuda'
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5").to(DEVICE)
text = """pragma solidity ^0.5.7;
// Context: ParentA | Functions: helloA helloB | Constants: constantA
contract HelloWorld is ParentA {"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.to(DEVICE)
generated_ids = model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
"""
string public constant name = "Hello World";
...
uint256 public constant override returns (uint256) {
return initialSupply;
}
function initialSupply() public view returns (uint256) {
...
"""
Advanced Usage
- A hello world example to use this model, notice the input
text
includes
- Header solidity version like
pragma solidity ^0.5.7
- Ancestor class/library info, e.g. public functions and constants from
ParentA
- Contract/Library/Interface declaration header, e.g.
HelloWorld
ended with {
- Or simply use the test widget on the right side of the window and test, however
the quality is known to be worse without decoding params
đ Documentation
Background
- Base T5 code model: https://huggingface.co/Salesforce/codet5-large
- Source data: https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts
-
Processing steps: Clean, contract-level segmentation sepration, split in and out
-
After processing input sample
pragma solidity 0.5.7;
// Context: PauserRole | Functions: isPauser addPauser renouncePauser | Constants:
contract Pausable is PauserRole {
-
After processing output sample (notice indentation is bad, this is intentional to reduce token size)
event Paused(address account);
event Unpaused(address account);
bool private _pausableActive;
bool private _paused;
constructor () internal {
_paused = false;
}
function paused() public view returns (bool) {
return _paused;
}
modifier whenNotPaused() {
require(!_paused);
_;
}
modifier whenPaused() {
require(_paused);
_;
}
function pause() public onlyPauser whenNotPaused whenPausableActive {
_paused = true;
emit Paused(msg.sender);
}
function unpause() public onlyPauser whenPaused whenPausableActive {
_paused = false;
emit Unpaused(msg.sender);
}
function _setPausableActive(bool _active) internal {
_pausableActive = _active;
}
modifier whenPausableActive() {
require(_pausableActive);
_;
}
}
- Source training code: See the end to end notebook at code dir here
Future TODO
- The model is significantly under-trained because of lack of GPU budget, need 10x colab resources (~$100 for full train)
- This is quite limited on how the model is used, potentially we could switch to GPT2 decoder-only to compare, but CodeT5 has its strong code optimization
- Need more classifiers (T5 or BERT alike) to detect potential defects.
đ License
This project is licensed under the Apache-2.0 license.