Charllama 35M
C
Charllama 35M
Developed by inkoziev
CharLLaMa-35M is a miniature language model based on the LLaMa architecture, featuring character-level tokenization, suitable for various experimental scenarios where BPE tokenization underperforms.
Downloads 61
Release Time : 8/31/2023
Model Overview
This model is specifically developed for Russian poetry experiments, pre-trained on a corpus rich in poetic texts, with 35,913,600 parameters. It is suitable for tasks such as generative spell checking, text classification, text transcription, and spelling error detection.
Model Features
Character-level tokenization
Utilizes character-level tokenization, ideal for scenarios where BPE tokenization performs poorly, such as spell checking and text transcription.
Poetic text pre-training
Pre-trained on a large corpus of Russian poetic texts, making it well-suited for poetry-related tasks.
Lightweight model
With only 35,913,600 parameters, it is suitable for resource-constrained experimental scenarios.
Model Capabilities
Text generation
Text classification
Spell checking
Text transcription
Spelling error detection
Use Cases
Text processing
Generative spell checker
Leverages character-level tokenization to detect and correct spelling errors.
Text classification
Replaces TfidfVectorizer(analyzer='char') in scenarios where character-level n-gram baselines perform well.
Text transcription
Suitable for text transcription tasks requiring character-level processing.
Poetry generation
Russian poetry generation
Generates Russian poetry using pre-trained poetic texts.
Featured Recommended AI Models