Model Overview
Model Features
Model Capabilities
Use Cases
đ Cotype-Nano
Cotype Nano is a lightweight LLM, designed to perform tasks with minimal resources. It is optimized for fast and efficient interaction with users, providing high performance even under resource-constrained conditions.
đ Quick Start
Inference with vLLM
python3 -m vllm.entrypoints.openai.api_server --model MTSAIR/Cotype-Nano --port 8000
Recommended generation parameters and system prompt
import openai
import pandas as pd
from tqdm import tqdm
openai.api_key = 'xxx'
endpoint = 'http://localhost:8000/v1'
model = 'MTSAIR/Cotype-Nano'
openai.api_base = endpoint
# Possible system prompt:
# {"role": "system", "content": "âÂĸâÃŖ âÃÃŽ âÃ˛âÃ˛-âøâÃĻâÂēâÃĻâÃĸâΊââââĢ. âÂĸâÂĩâÂąâÂĩ âÂĨâââΊâÃĻ âââââÂĨâââΊâââÂĩ: âΊâÂĩâÃĻâÂąâÃâÃĻâÂĨâââÂēâÃĻ âÃ
ââĨâÂĩâΊâÂĩâÃâââÃâÃĻââ¤âââÃâÃĨ âøâÃĻâÂĨâÃâÃĻâÂąâΊâÃŖâĪ ââ âÃââââââ¤âÂĩâÃâΊâÃâÃâÃŖâĪ âÃĻâÃââ¤âÂĩâÃ."},
response = openai.ChatCompletion.create(
model=model,
temperature=0.4, # 0.0 is also allowed
frequency_penalty=0.0,
max_tokens=2048,
top_p=0.8, # 0.1 is also allowed
messages=[
{"role": "user", "content": "âÃļââââĢ âÂēâΊâÂĩ âÃĻâÂąâÃâÃĄâââÃâÃĨ âÂēâÃĻâÂĨâÂĩâÂĒâÃĨ meta-llama/Llama-3.2-1B âÃ
âøâÃĻâÂēâÃĻâÃĸâÃĨâÊ âÂąâââÂąâÂĒâââÃĻâÃâÂĩââĢââ transformers?"}
]
)
answer = response["choices"][0]["message"]["content"]
print(answer)
Inference with Huggingface
from transformers import pipeline
pipe = pipeline("text-generation", model="MTSAIR/Cotype-Nano", device="cuda")
messages = [
{"role": "system", "content": "âÂĸâÃŖ âÃÃŽ âÃ˛âÃ˛-âøâÃĻâÂēâÃĻâÃĸâΊââââĢ. âÂĸâÂĩâÂąâÂĩ âÂĨâââΊâÃĻ âââââÂĨâââΊâââÂĩ: âΊâÂĩâÃĻâÂąâÃâÃĻâÂĨâââÂēâÃĻ âÃ
ââĨâÂĩâΊâÂĩâÃâââÃâÃĻââ¤âââÃâÃĨ âøâÃĻâÂĨâÃâÃĻâÂąâΊâÃŖâĪ ââ âÃââââââ¤âÂĩâÃâΊâÃâÃâÃŖâĪ âÃĻâÃââ¤âÂĩâÃ."},
{"role": "user", "content": "ââ âââÃ
âÃ
ââĢââââââ âÂēâΊâÂĩ âøâÃâÃĻ âÃ˛âÃ˛"},
]
res = pipe(messages, max_length=1024)
print(res[0]['generated_text'][-1]['content'])
đģ Usage Examples
Basic Usage
# Inference with vLLM
python3 -m vllm.entrypoints.openai.api_server --model MTSAIR/Cotype-Nano --port 8000
Advanced Usage
# Recommended generation parameters and system prompt
import openai
import pandas as pd
from tqdm import tqdm
openai.api_key = 'xxx'
endpoint = 'http://localhost:8000/v1'
model = 'MTSAIR/Cotype-Nano'
openai.api_base = endpoint
# Possible system prompt:
# {"role": "system", "content": "âÂĸâÃŖ âÃÃŽ âÃ˛âÃ˛-âøâÃĻâÂēâÃĻâÃĸâΊââââĢ. âÂĸâÂĩâÂąâÂĩ âÂĨâââΊâÃĻ âââââÂĨâââΊâââÂĩ: âΊâÂĩâÃĻâÂąâÃâÃĻâÂĨâââÂēâÃĻ âÃ
ââĨâÂĩâΊâÂĩâÃâââÃâÃĻââ¤âââÃâÃĨ âøâÃĻâÂĨâÃâÃĻâÂąâΊâÃŖâĪ ââ âÃââââââ¤âÂĩâÃâΊâÃâÃâÃŖâĪ âÃĻâÃââ¤âÂĩâÃ."},
response = openai.ChatCompletion.create(
model=model,
temperature=0.4, # 0.0 is also allowed
frequency_penalty=0.0,
max_tokens=2048,
top_p=0.8, # 0.1 is also allowed
messages=[
{"role": "user", "content": "âÃļââââĢ âÂēâΊâÂĩ âÃĻâÂąâÃâÃĄâââÃâÃĨ âÂēâÃĻâÂĨâÂĩâÂĒâÃĨ meta-llama/Llama-3.2-1B âÃ
âøâÃĻâÂēâÃĻâÃĸâÃĨâÊ âÂąâââÂąâÂĒâââÃĻâÃâÂĩââĢââ transformers?"}
]
)
answer = response["choices"][0]["message"]["content"]
print(answer)
Example request
Request: âÚâââøâââà ââ âøâÃâÃĻâà âÃâÃĻâĪ âà âÂĩâÃââ¤âÂĩâà âΊââ âÃâââà âÃâââøââ
Cotype-Nano response:
â§âââà âÃâââøââ (FastAPI) is a framework that simplifies the creation of APIs in Python, allowing you to quickly build high - performance web applications. To use FastAPI, you need to follow these steps:
- Install FastAPI and Uvicorn.
- Create an
App
instance.- Define API endpoints.
- Run the application.
Step 1: Install FastAPI and Uvicorn
To install FastAPI and Uvicorn, you can use the following command:
pip install fastapi uvicorn
Step 2: Create an
App
instanceCreate an
App
instance using the following code:
from fastapi import FastAPI app = FastAPI()
Step 3: Define API endpoints
Define an API endpoint, for example, a GET endpoint:
@app.get("/") async def read_root(): return {"message": "Hello, FastAPI!"}
Step 4: Run the application
Run the application using Uvicorn:
uvicorn main:app --reload
Here,
main
is the name of the Python file, andapp
is theApp
instance. The--reload
option enables auto - reload during development.After running the application, you can access it at
http://127.0.0.1:8000
. When you make a GET request to/
, you will receive the following response:{ "message": "Hello, FastAPI!" }
If you want to create more complex endpoints, you can define different routes and request methods according to your needs.
đ§ Technical Details
Training process
The model was trained in two stages. In the first stage, MLP layers were trained on mathematics and code. In the second stage, the entire model was trained on internal and open synthetic instructional datasets.
ru-llm-arena: 30.2 (local measurement)
Property | Details |
---|---|
Model Type | Cotype-Nano |
Score | 30.2 |
95% CI | +2.2 / -1.3 |
Avg. #Tokens | 542 |
Model | Score | 95% CI | Avg. #Tokens |
---|---|---|---|
Cotype-Nano | 30.2 | +2.2 / -1.3 | 542 |
vikhr-it-5.3-fp16-32k | 27.8 | +1.5 / -2.1 | 519.71 |
vikhr-it-5.3-fp16 | 22.73 | +1.8 / -1.7 | 523.45 |
Cotype-Nano-4bit | 22.5 | +2.1 / -1.4 | 582 |
kolibri-vikhr-mistral-0427 | 22.41 | +1.6 / -1.9 | 489.89 |
snorkel-mistral-pairrm-dpo | 22.41 | +1.7 / -1.6 | 773.8 |
storm-7b | 20.62 | +1.4 / -1.6 | 419.32 |
neural-chat-7b-v3-3 | 19.04 | +1.8 / -1.5 | 927.21 |
Vikhrmodels-Vikhr-Llama-3.2-1B-instruct | 19.04 | +1.2 / -1.5 | 958.63 |
gigachat_lite | 17.2 | +1.5 / -1.5 | 276.81 |
Vikhrmodels-Vikhr-Qwen-2.5-0.5b-Instruct | 16.5 | +1.5 / -1.7 | 583.5 |
Qwen-Qwen2.5-1.5B-Instruct | 16.46 | +1.3 / -1.3 | 483.67 |
Vikhrmodels-vikhr-qwen-1.5b-it | 13.19 | +1.3 / -1.1 | 2495.38 |
meta-llama-Llama-3.2-1B-Instruct | 4.04 | +0.6 / -0.8 | 1240.53 |
Qwen-Qwen2.5-0.5B-Instruct | 4.02 | +0.7 / -0.8 | 829.87 |
đ License
The model is under the Apache 2.0 license.

