M

Modularstarencoder

Developed by modularStarEncoder
A 1-billion parameter code encoder pre-trained on The Stack v2 dataset, featuring modular design and bidirectional self-attention mechanism
Downloads 147
Release Time : 2/18/2025

Model Overview

A pre-trained encoder specifically designed for code processing, supporting 600+ programming languages with multi-exit modular architecture and 2048-token context length

Model Features

Modular design
Contains five exit points supporting multi-exit fine-tuning for downstream tasks
Efficient architecture
Reduced from StarCoder-2's 15B parameters to 1B, using Grouped Query Attention (GQA) and bidirectional self-attention mechanism
Long context support
Maximum input length extended to 2048 tokens, outperforming previous code encoders
Multi-language support
Supports code processing for 600+ programming languages
Training optimization
Adopts multi-layer loss function with MLM+in-context loss, accelerated by FlashAttention V2

Model Capabilities

Code snippet embedding
Code representation learning
Multi-language code processing
Long sequence code analysis

Use Cases

Code analysis
Code similarity detection
Compare semantic similarity of code snippets through embedding representations
Code search enhancement
Provide high-quality embedding representations for code search engines
Programming assistance
IDE intelligent completion
Serve as underlying model supporting code auto-completion features
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase