Qwen2.5 3B YiLin GGUF Q4 K M
Model Overview
Model Features
Model Capabilities
Use Cases
๐ Qwen2.5-3B-YiLin-GGUF-q4_k_m
This model is a dynamically 4-bit quantized version of Qwen2.5-3B-Instruct. It was trained for 3 epochs on two open-source datasets and one synthetic dataset using QLoRA, with a maximum context length of 8192.
It successfully migrates a Chain of Thought (CoT) similar to that of DeepSeek-r1 to Qwen2.5-3B-Instruct. Moreover, it implements the feature of "controlling the CoT with system prompts", enabling the model to generate CoT more flexibly and interact with external tools during the thinking process, thus achieving "embodied intelligence" for the model itself.
This feature aims to solve the problem that when the model outputs its thinking process and encounters situations where it needs to query information or have the user supplement questions, it cannot interact with the outside world, leading to underutilized inference performance.
The author believes that this feature has great application prospects. When combined with MCP, it can completely overcome the limitations of traditional agents that call tools through multi-round dialogues. It moves tool calls into the CoT, enhancing the level and efficiency of tool use through the CoT and making better use of various tools in the real world.
The synthetic dataset uses the prompts from this repository yilin-chatgpt-prompts to synthesize over 2000 articles in the Yilin style through an automated data synthesis pipeline.
๐ฆ Weights
It provides the 16-bit .safetensors file merged into Qwen2.5-3B-Instruct, as well as BF16 and Q4_K_M files in the GGUF format for easy deployment with ollama.
๐ฏ Intended Use
To research the feasibility and benefits of "controlling the CoT with system prompts".
As a pioneering model, it aims to set an example and guide further research, thereby advancing the progress of LLMs on the path to AGI.
โ ๏ธ Misuse
The model can be used for almost anything. It has good human alignment and conforms to general values. It is also good at writing articles with artistic flair.
๐ Quick Start
๐ป Inference
- Ollama Deployment: You can use ollama for deployment. Try to use the 16-bit GGUF file if your hardware can handle it. If not, you can use Q4_K_M. A good tip when using Q4_K_M is to try a user prompt like this:
Here is your prompt\n<think>
- VLLM: You can directly run this model with vllm by passing in
likewendy/Qwen2.5-3B-YiLin-GGUF-q4_k_m
. - Using Code for Regular Inference:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="likewendy/Qwen2.5-3B-YiLin-GGUF-q4_k_m",
max_seq_length=8192,
dtype=None,
load_in_4bit=False,
)
from unsloth.chat_templates import get_chat_template
# Get the chat template for Qwen-2.5 and apply it to the tokenizer
tokenizer = get_chat_template(
tokenizer,
chat_template="qwen-2.5",
)
FastLanguageModel.for_inference(model) # Enable native 2x speed inference
# Define the message list, including the user role and content
messages = [
{"role": "system", "content": "If you need to search the web, you need to output text in this format before the </think> tag to call the tool and search the web: {'search':'Here is the keyword to search'}"},
{"role": "user", "content": "Give me the latest songs by Deng Wenyi"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)
# Use the model to generate output
_ = model.generate(
input_ids=inputs,
streamer=text_streamer,
max_new_tokens=8192,
use_cache=True,
temperature=0.7,
top_p=0.7,
)
- Using Code for a Complete Demo of "Controlling the CoT with System Prompts" to Let the User Supplement Questions:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="likewendy/Qwen2.5-3B-YiLin-GGUF-q4_k_m",
max_seq_length=8192,
dtype=None,
load_in_4bit=False,
)
from unsloth.chat_templates import get_chat_template
# Get the chat template for Qwen-2.5 and apply it to the tokenizer
tokenizer = get_chat_template(
tokenizer,
chat_template="qwen-2.5",
)
FastLanguageModel.for_inference(model) # Enable native 2x speed inference
import json
from transformers import TextStreamer, StoppingCriteria, StoppingCriteriaList
should_stop_ok = False
class CustomStreamer(TextStreamer):
def __init__(self, tokenizer, **kwargs):
super().__init__(tokenizer, **kwargs)
self.should_stop = False
self.issue_content = None
self.buffer = ""
self.output_text = ""
def put(self, value):
# Output to the console in real-time (parent method)
# If you only want to decode the last newly generated token, you can also just decode the last one
super().put(value)
# If value is a tensor, first move it to the CPU and convert it to a list
# If it is already a list, you can judge whether similar operations are needed according to the actual situation
if hasattr(value, "cpu"):
value = value.cpu()
if hasattr(value, "numpy"):
value = value.numpy().tolist()
elif hasattr(value, "tolist"):
value = value.tolist()
# If the batch size is 1, take out the first row
# Here it is assumed that the incoming value has a two-dimensional [batch_size, sequence_length] structure
# If the batch size is not 1, modify it accordingly
if isinstance(value[0], list):
value = value[0]
# Now value should be a one-dimensional list of tokens
last_token = value[-1:]
text_chunk = self.tokenizer.decode(last_token, skip_special_tokens=True)
self.buffer += text_chunk
# Split and process complete lines
while '\n' in self.buffer:
newline_pos = self.buffer.find('\n')
line = self.buffer[:newline_pos]
self.buffer = self.buffer[newline_pos+1:]
self._process_line(line)
def _process_line(self, line):
self.output_text += line + '\n' # Record the complete output
# Detect whether the stop condition is met
line = line.strip()
if line.startswith("{'issue_add':") or line.startswith('{"issue_add":'):
try:
# Process single quotes and parse JSON
json_str = line.replace("'", '"')
data = json.loads(json_str)
if 'issue_add' in data and not should_stop_ok:
self.should_stop = True
self.issue_content = data['issue_add']
except json.JSONDecodeError:
pass
def end(self):
# Process the remaining buffer content
if self.buffer:
self._process_line(self.buffer)
self.buffer = ""
super().end()
class StopCondition(StoppingCriteria):
def __init__(self, streamer):
super().__init__()
self.streamer = streamer
def __call__(self, input_ids, scores, **kwargs):
return self.streamer.should_stop
# Define the message list, including the user role and content
messages = [
{"role": "system", "content": "If you need the user to supplement more information, you need to output text in this format before the </think> tag to call the tool and let the user supplement more information. The format is: {'issue_add':'Here you need to tell the user what information is missing'}"},
{"role": "user", "content": "I bought some bulk snacks by weight and spent 100 yuan. How much is each bag?"},
]
while True:
# Prepare the model input
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True if not should_stop_ok else False,
return_tensors="pt"
).to("cuda")
# Create a streaming processor and stop condition
streamer = CustomStreamer(tokenizer, skip_prompt=True)
stop_condition = StopCondition(streamer)
if should_stop_ok:
inputs_str = tokenizer.decode(inputs[0], skip_special_tokens=False)
# delete <|im_end|>
inputs_str = inputs_str[:-11]
# print(inputs_str)
import torch
inputs = torch.tensor([tokenizer.encode(inputs_str,add_special_tokens=False)], dtype=torch.long, device='cuda')
# Perform generation (with stop condition)
model.generate(
input_ids=inputs,
streamer=streamer,
max_new_tokens=8192,
use_cache=True,
temperature=0.7,
top_p=0.7,
stopping_criteria=StoppingCriteriaList([stop_condition]),
)
# Add the model output to the conversation history
if streamer.output_text.strip():
messages.append({"role": "assistant", "content": streamer.output_text.strip()})
# Detect the stop condition
if streamer.should_stop:
print(f"\n---\nDetected issue_add content: {streamer.issue_content}")
user_input = input("Please enter your reply: ")
print("---\n")
should_stop_ok = True
# Concatenate the user input to the assistant message
messages[len(messages)-1]['content'] += f"\nTool call result: {user_input}\nI have obtained the tool call result: โ{user_input}โ. Now I need to continue reasoning based on the tool call result.\nFirst, "
# print(messages)
else:
break # End generation normally
The terminal output of the above code:
<think>
I need to solve this problem now. The question states that I bought some bulk snacks by weight and spent a total of 100 yuan. However, it doesn't mention the price of each bag. So, what should I do?
First, I need to carefully examine the question. The "bulk snacks by weight" mentioned in the question might refer to large packages, such as a pack of cookies, a box of chocolates, or a mix of nuts, which don't have a fixed price per bag. Therefore, the price of each bag could vary. However, the question doesn't provide the specific types and quantities, only the total price of 100 yuan.
The possible issue here is whether the "bulk" in the question implies different prices per bag. For example, some bags might be more expensive while others are cheaper. In this case, with a total amount of 100 yuan, it might not be possible to evenly distribute the cost to each bag. At this point, more information is needed to determine the price of each bag, such as the number of different types of bags and the quantity of each.
However, the question doesn't provide this additional information. There might be assumptions or implicit conditions in the question, such as whether there are discounts or promotions, but these aren't mentioned. Or perhaps the question overlooked these details, making it impossible to directly calculate the price of each bag.
So, according to the conventional approach, if the price of each bag is different and the total amount is 100 yuan, we might need to know the specific quantity of each type of bag to calculate the price per bag. For example, if there are 5 different sizes of bags with quantities of 2, 3, 4, 5, and 6 respectively, then there are a total of 5ร(2 + 3 + 4 + 5 + 6) = 55 bags, and the total weight is 2 + 3 + 4 + 5 + 6 = 20 kilograms. The price per kilogram would then be 100 / 20 = 5 yuan. In this case, the price per bag should be 5 yuan divided by the corresponding quantity. For example, the first bag might cost 5 yuan, the second 8 yuan, the third 10 yuan, and so on. But the question doesn't provide the specific data, so it's impossible to accurately calculate.
Alternatively, the question might assume that all bags have the same price, or there are a few different types of bags but the quantity is unknown. For example, if there are three types of bags, A, B, and C, with prices of a yuan, b yuan, and c yuan per bag respectively, and a total of n bags, then the total cost would be a * n + a * b + n * c = 100 yuan. However, similarly, the question doesn't provide the specific values of a, b, c, or n, so the answer can't be solved.
In addition, there might be other hidden conditions in the question, such as whether there are coupons or discounts, but these aren't mentioned. So, we can only follow the simplest approach, assuming that each bag has the same price, or there are different prices but the question doesn't specify. Therefore, based on the information provided in the question, an accurate answer can't be obtained.
To summarize, since the question doesn't provide enough information, such as the price type and quantity of each bag, it's impossible to determine the specific price of each bag. More data might be needed to answer the question. So, the correct response should be: "It can't be determined because necessary information is missing." Or more concisely, "More information is needed." This is both logical and respects the limited information provided in the question.
{"issue_add":"Missing the price type and quantity of each bag"}<|im_end|>
---
Detected issue_add content: Missing the price type and quantity of each bag
Please enter your reply: 20 bags
---
I need to confirm whether this result is reasonable. Bulk snacks usually come in various sizes, such as small, medium, and large bags, each with a different price. Assuming the price of each bag is an integer, with a total amount of 100 yuan, the price of each bag should be the result of dividing 100 by the number of bags. If there are 20 bags, then the price of each bag is 5 yuan. However, in reality, the number of bags of bulk snacks might not be 20, but other numbers, such as 10, 25, 50, etc. We need to verify whether this result is in line with the actual situation.
Next, consider the common sizes and price distribution of bulk snacks. For example, the common prices of snack bags might be as follows:
- Small bag (e.g., 50 grams): Around 5 yuan
- Medium bag (e.g., 100 grams): Around 10 yuan
- Large bag (e.g., 200 grams): Around 15 yuan
Assuming the price of each bag falls within this range, we can try to divide 100 yuan into several parts. For example, if each bag costs 5 yuan, then 20 bags are needed; if it's 10 yuan, 10 bags are needed; if it's 15 yuan, 6.67 bags are needed, which is obviously not reasonable. Therefore, the price of each bag should be between 5 and 10 yuan, and the total should be 100 yuan.
Now, assume the price of each bag is 5 yuan. How many bags are needed to make up 100 yuan? The calculation is as follows:
100 รท 5 = 20 bags
So, if each bag costs 5 yuan, 20 bags are needed. This result is consistent with the result provided by the tool, so it's reasonable.
Conclusion: The price of each bag is 5 yuan, and a total of 20 bags were purchased.<|im_end|>
๐ก Tips
- It is recommended to set the parameters
temperature=0.7
andtop_p=0.7
.
๐ Examples
- Searching the Web through "Controlling the CoT with System Prompts": System Prompt:
If you need to search the web, you need to output text in this format before the </think> tag to call the tool and search the web: {'search':'Here is the keyword to search'}
User Prompt:
Give me the latest songs by Deng Wenyi
Model Reply:
<think>
The user wants me to provide the latest songs by Deng Wenyi. First, I need to confirm who Deng Wenyi is. It might be the name of a singer. Then, I should recall if there are any new songs released by her recently. Deng Wenyi is a well - known female singer from Taiwan, China. Her music style might be pop or electronic dance music?
Next, I need to check if there are any new albums or singles released by her recently. I remember she once collaborated with JJ Lin on the song "Never Break Up", but I'm not sure if there are other works. Additionally, she might have participated in some compilations or talent shows like "Super Star Avenue". However, the user might be more interested in her original works.
According to my knowledge base, one of Deng Wenyi's recent popular singles is "Love Under the Moonlight", which is an electronic dance music - style song often used in dance settings. Another song is "You Are the One I Love Most", which sounds like a ballad and might be suitable for a gentle evening atmosphere. However, this information might need further verification.
Then, I need to consider the user's potential needs. They might be looking for new songs by Deng Wenyi to play, or for a concert, event, etc. If it's for a concert, her hit songs might need to be recommended, while for daily listening, more options might be needed.
I also need to pay attention to copyright issues, ensuring that the links provided are legal and don't infringe on any copyrights. So, it's best to directly provide the lyrics and song names, or provide download links, but be careful not to provide illegal content.
Finally, I should organize this information, list the latest songs point - by - point, and briefly describe the type and style, allowing the user to choose according to their preferences. At the same time, remind the user that they might need to obtain the latest information through official channels, as the singer might have new developments.
{'search':'Deng Wenyi latest songs'}
</think>
Deng Wenyi (Yumi) is a well - known female singer from Taiwan, China, specializing in electronic dance music and pop music. Here are her recent popular singles and latest works:
---
### **1. "Love Under the Moonlight"**
- **Type**: Electronic dance music
- **Features**: Strong rhythm, full of energy, often used as party or fitness music.
- **Recommended Scenarios**: Nightclubs, gyms, outdoor activities.
---
### **2. "You Are the One I Love Most"**
- **Type**: Ballad pop
- **Features**: Warm melody, expressing deep affection and longing, suitable for quiet listening at night.
- **Recommended Scenarios**: Romantic dates, movie soundtracks.
---
### **3. "Never Break Up"**
- **Type**: Upbeat dance music
- **Features**: A classic collaboration with JJ Lin, with a catchy melody, often used in social settings.
- **Recommended Scenarios**: Parties, KTV, advertising background music.
---
### **4. "Heartbeat"**
- **Type**: Pop electronic
- **Features**: Strong rhythm, describing the moment of falling in love, suitable for love - themed MVs or advertisements.
---
### **5. "Flying Against the Light"**
- **Type**: Inspirational pop
- **Features**: Encouraging and positive lyrics, suitable for positive - energy occasions.
---
### How to Get the Latest Songs?
1. **Official Platforms**: Visit Deng Wenyi's Weibo, Douyin, or YouTube channels to stay updated.
2. **Digital Music Platforms**: Such as Spotify, Apple Music, YouTube Music, etc. Search for the song names to find the full versions.
3. **Physical Releases**: Some singles can be purchased as physical CDs or downloaded through online stores.
---
If you need the specific lyrics or download links, please let me know! <|im_end|>
- Letting the User Supplement Questions through "Controlling the CoT with System Prompts": System Prompt:
If you need the user to supplement more information, you need to output text in this format before the </think> tag to call the tool and let the user supplement more information. The format is: {'issue_add':'Here you need to tell the user what information is missing'}
User Prompt:
I bought some bulk snacks by weight and spent 100 yuan. How much is each bag?
Model Reply:
<think>
I need to solve this problem now. The question states that I bought some bulk snacks by weight and spent a total of 100 yuan. It asks about the price of each bag. However, there seems to be an issue here because bulk snacks are usually priced by weight, not by the number of bags. For example, it could be in large containers, and each bag inside might have a different weight of snacks.
Does the "price of each bag" in the question refer to the unit price of each bag? But the question seems to mix up the weight and the number of bags. For example, if I bought 5 bags of snacks and each bag costs 20 yuan, then the total cost is 5ร20 = 100 yuan, which meets the condition in the question. In this case, each bag is 20 yuan?
Or is it possible that the "bulk snacks by weight" in the question means pricing based on different weights? For example, each bag has a different weight, and the price is calculated according to the weight. In this case, the weight of each bag might be different, so the unit price would also vary. For instance, the first bag might have 1 jin, and the second bag might have 2 jin. Then the total weight is 1 + 2 = 3 jin, and the price per jin is 100 / 3โ33.33 yuan. In this case, the unit price of each bag should be different.
However, the question doesn't specify the exact weight of each bag, only the price per bag. How should this be handled? Maybe the "price of each bag" in the question refers to the unit price of each bag, and this unit price might be determined by the size or weight of the bag. For example, the price of each bag could be fixed regardless of the weight inside. In this case, the unit price of each bag is 100 yuan divided by the number of bags. For example, if 5 bags are bought, each bag is 100 / 5 = 20 yuan, which is consistent with the previous example.
But the question doesn't clearly state whether the price of each bag is fixed or calculated based on weight. If it's the latter, the price of each bag might be different, and it's not possible to directly calculate the unit price of each bag. For example, if the first bag is 1 jin, the second is 2 jin, and the third is 3 jin, then the total weight is 6 jin, and the average price per bag is 60 yuan, with each bag's unit price being 60 / 3 = 20 yuan. In this case, the unit price of each bag should be the same, equal to the total price divided by the number of bags.
However, the question seems more complicated, and more information might be needed to determine the unit price of each bag. For example, the question might mention that the weight of each bag is different, or the price of each bag is different. At this point, more information is needed, such as the weight of each bag or the price of each bag. Without this information, it's not possible to accurately answer the question.
So, based on the current information, there are two possible situations:
1. If the price of each bag is fixed, for example, each bag is 20 yuan, then each bag is 20 yuan.
2. If the price of each bag is different, for example, the first bag is 1 jin, the second is 2 jin, and the third is 3 jin, then the total weight is 6 jin, and the unit price of each bag is 60 yuan, with each bag's unit price being 20 yuan.
But the question doesn't provide enough information to determine the specific weight and price of each bag. Therefore, more details might be needed to answer the question. For example, the question might mention the weight of each bag or the corresponding unit price, or other relevant information.
In conclusion, based on the information given in the question, it's not possible to accurately calculate the unit price. More detailed data is required.
{'issue_add':'Here you need to tell me the weight of each bag and their respective unit prices'}
Bulk snacks are usually priced by weight, but the "price of each bag" mentioned in the question might be ambiguous. The following key information needs to be confirmed first:
1. **Packaging unit of bulk snacks**: Are they individually packaged (such as small bags) or in a unified container (such as a large bucket)?
- If they are individually packaged, the weight or volume of each bag needs to be known.
- If they are in a unified container, the total weight and the number of bags inside need to be known.
2. **Pricing method**:
- Are there bags with different unit prices? For example, some packaged snacks are priced at 20 yuan per bag, while others are 30 yuan per bag.
- Is there a fixed - price packaging (such as 20 yuan per bag), or is it priced according to weight?
Without specific data, it's not possible to directly calculate. For example:
- Assuming each bag is 20 yuan per bag, then 100 yuan can buy 5 bags.
- If there are different unit prices, all options need to be listed and verified through combination.
It is recommended to supplement the following information:
- The weight of each bag (such as 1 jin, 2 jin, etc.).
- The specific pricing plan (such as 20 yuan per bag, 30 yuan per bag, etc.).
- The total number of bags purchased (or the total weight).
Based on the existing information, it's not possible to accurately calculate the unit price. More detailed data is required. <|im_end|>
- The Possible Appearance of Letting the User Supplement Questions through "Controlling the CoT with System Prompts" in the Interface:
๐ License
This project is licensed under the GPL-3.0 license.
๐ Model Information
Property | Details |
---|---|
Base Model | unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit |
Pipeline Tag | text-generation |
Tags | not-for-all-audiences, text-generation-inference, transformers, unsloth, qwen2, gguf |

