đ SmolLM2-135M-Eagle
SmolLM2-135M-Eagle is a fine - tuned language model that enhances the capabilities of handling both Russian and English content, based on the SmolLM2 - 135M base model.
đ Quick Start
SmolLM2-135M-Eagle is a fine-tuned version of the SmolLM2-135M model on the EagleSFT dataset. It aims to improve the model's performance in both Russian and English language tasks.
The GGUF version of this model is available at: SmolLM2-135M-Eagle-GGUF
⨠Features
- Bilingual Capability: Specifically fine-tuned to handle bilingual content, better understanding and generating content in Russian while maintaining English competency.
- Lightweight Design: Built upon the SmolLM2 - 135M base model, which offers a good balance between performance and resource requirements with 135 million parameters.
đ Documentation
Model Description
SmolLM2-135M-Eagle is a lightweight language model. The fine - tuning process extends the base model's capabilities, enabling it to better understand and generate Russian content while keeping its English proficiency.
Base Model
The model is based on SmolLM2-135M, a compact language model with 135 million parameters. This model strikes a good balance between performance and resource consumption.
Fine - tuning Details
Dataset
The model was fine - tuned on the EagleSFT dataset. This dataset contains 536,231 pairs of human questions and machine - generated responses in both Russian and English. It mainly focuses on educational content, but also includes everyday questions and casual conversations.
Environmental Impact
- Training duration: 26h 26m in total.
- 15h 19m 52s in Tyumen, Russia (300W power consumption).
- 11h 6m 8s in Saint - Petersburg (360W power consumption).
- Hardware: 1 x RTX 4090.
- Carbon emissions: Approximately 3.11 kg CO2eq.
- Calculated based on average power consumption and average CO2eq/kWh (350g) in these regions.
- Tyumen: 300W * 15.33h * 350g/kWh = 1.61 kg CO2eq.
- Saint - Petersburg: 360W * 11.10h * 350g/kWh = 1.50 kg CO2eq.
Training Parameters
- Training approach: Supervised Fine - Tuning (SFT).
- Training epochs: 2.
- Learning rate: 3.0e - 04.
- Precision: bfloat16.
Limitations and Capabilities
It should be noted that this model was only fine - tuned through SFT on a relatively small number of tokens. As a result, compared to its English capabilities, the model has limited data to rely on when answering in Russian.
Despite its limitations, the model shows minimal improvement in:
- Basic recognition of Russian prompts (though with frequent misunderstandings).
- Handling simple tasks formatted as "{question in Russian}, answer in English".
- Basic translation from Russian to English (though the quality remains poor).
The model's minimal understanding of the Russian language comes solely from the supervised fine - tuning process without proper pre - training on a Russian text corpus, leading to severely limited capabilities.
Experimental Capabilities
The model demonstrates some experimental capabilities, but with significant limitations:
- Basic Russian text understanding (with frequent errors and misinterpretations).
- Limited question - answering in Russian (quality significantly lower than English).
- Basic Russian to English translation (better than English to Russian).
Limitations
- NOT SUITABLE FOR PRODUCTION USE: This model should not be used in any production environment.
- Extremely limited knowledge base for the Russian language due to the lack of pre - training on Russian text.
- Unoptimized tokenizer performance for the Russian language results in inefficient token usage.
- Output quality in Russian will be unsatisfactory for most use cases.
- May produce inaccurate, inconsistent, or inappropriate responses, especially in Russian.
- All limitations of the base SmolLM2 - 135M model still apply.
đ License
This model is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
Lightweight bilingual language model |
Training Data |
EagleSFT dataset (536,231 pairs of human questions and machine - generated responses in Russian and English, mainly educational content, also includes everyday questions and casual conversations) |
Base Model |
SmolLM2 - 135M (135 million parameters) |
Training Approach |
Supervised Fine - Tuning (SFT) |
Training Epochs |
2 |
Learning Rate |
3.0e - 04 |
Precision |
bfloat16 |
Training Duration |
26h 26m (15h 19m 52s in Tyumen, Russia; 11h 6m 8s in Saint - Petersburg) |
Hardware |
1 x RTX 4090 |
Carbon Emissions |
Approximately 3.11 kg CO2eq |
â ī¸ Important Note
This model is not suitable for production use. It has significant limitations, especially in handling the Russian language due to the lack of proper pre - training.
đĄ Usage Tip
When using this model, be aware of its limitations, especially in Russian language tasks. The output may be inaccurate, inconsistent, or inappropriate, especially in Russian.