C

Codesearch ModernBERT Owl 2.0 Plus

Developed by Shuu12121
The latest pre-trained model designed for high-quality code understanding and semantic retrieval, supporting long sequence processing of 8 programming languages.
Downloads 602
Release Time : 5/26/2025

Model Overview

This model is used for function-level semantic code search, supporting search from natural language to code. It can also be used for tasks such as code completion, summary generation, classification, and clone detection.

Model Features

Pre-training with self-owned corpus
Pre-trained using a high-quality code and docstring corpus collected independently, with a scale approximately four times that of CodeBERT.
Multilingual support
Supports 8 programming languages, including the newly added TypeScript.
Long sequence processing ability
Can process sequences of up to 2048 tokens during training and can be extended to process 8192 tokens during inference.
Comprehensive data cleaning
Including using Tree-sitter to extract functions and docstrings, removing templated or non-English comments, and masking sensitive information.

Model Capabilities

Function-level semantic code search
Code completion
Code summary generation
Code classification
Code clone detection
RAG system retrieval support

Use Cases

Code search and understanding
Natural language code search
Search the codebase using natural language to quickly locate relevant functions.
Efficient code retrieval is achieved through the OwlSpotlight extension.
Code-assisted development
Code completion
Provide code completion suggestions based on the context.
Code summary generation
Automatically generate a summary description of the code.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase