Haruhi Dialogue Speaker Extract Qwen18
A dialogue extraction model fine-tuned based on qwen-1.8, capable of batch extracting summaries and dialogues from novel excerpts
Downloads 17
Release Time : 1/26/2024
Model Overview
This model is specifically designed to extract dialogue content and summary information from novel texts, supporting both Chinese and English text processing, with structured JSON format output
Model Features
Multilingual Support
Supports content extraction from both Chinese and English novels
Structured Output
Automatically generates JSON format results containing summaries and dialogues
Batch Processing Capability
Can process continuous text blocks and multi-chapter content
Model Capabilities
Text Summarization
Dialogue Content Recognition
Speaker Identification
Structured Data Output
Use Cases
Literary Analysis
Novel Dialogue Analysis
Extract dialogue content from novel texts for character analysis
Examples show accurate identification of dialogue content and speakers
Content Summarization
Automatically generate key summaries of novel excerpts
Examples demonstrate coherent paragraph summarization
Data Preprocessing
Dialogue Dataset Construction
Prepare training data for dialogue systems
Capable of batch processing large volumes of novel texts
🚀 Chat Haruhi Suzumiya's Dialogue Extraction Model
We aim to have a model capable of batch - extracting summaries and dialogues from novel chunks. This model fulfills this goal. It was trained on approximately 30k Chinese novels and 20k English novels, and fine - tuned on qwen - 1.8 for three epochs. In principle, it supports extraction from both Chinese and English novels.
The main project link is https://github.com/LC1332/Chat - Haruhi - Suzumiya.
- LC1332 collected the data and further extended the inference program to continuous chunks.
- khazic trained the model.
- hhhwmws0117 tested and uploaded the model to Hugging Face.
🚀 Quick Start
Inference Code
You can find the inference code at https://github.com/LC1332/Chat - Haruhi - Suzumiya/blob/main/notebook/Dialogue_Speaker_Extract_Test.ipynb
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("silk-road/Haruhi-Dialogue-Speaker-Extract_qwen18", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("silk-road/Haruhi-Dialogue-Speaker-Extract_qwen18", device_map="auto", trust_remote_code=True)
sys_prompt = "Given an input paragraph, extract the dialogues within it, and output them in JSON format. Let's think it step by step 1. summarize input paragraph into bullet format, store it in the summary field 2. extract the content of each dialogue (dialogue), determine the speaker of each sentence (said by), and store them in conversations."
text = "Your novel text"
response_str, history = model.chat(tokenizer, text, history=[], system=sys_prompt)
Official Prompt
Given an input paragraph, extract the dialogues within it, and output them in JSON format.
Let's think about it step by step:
- Summarize the input paragraph into bullet points and store it in the'summary' field.
- Extract the content of each dialogue ('dialogue'), identify the speaker for each sentence ('said by'), and store these in 'conversations'.
TODO
- [x] Expand to multi - chunk inference
- [x] Provide an English example
- [ ] Provide an example of multi - chapter parallel inference
- [ ] Try extracting summary directly from raw strings when JSON parsing fails
- [ ] Additionally attempt to use OpenAI for inference when failing
💻 Usage Examples
Basic Usage
Here are some output examples in both Chinese and English:
Chinese Output Example
{'summary': '- 彭蠡不在家中,老刀感到担忧并等待着彭蠡回家的时间,同时观察周围环境和人们的消费行为,表现出内心的饥饿感和焦虑情绪。', 'conversations': [{'dialogue': '哎,你们知道那儿一盘回锅肉多少钱吗?', 'said_by': '小李'}, {'dialogue': '靠,菜里有沙子。', 'said_by': '小丁'}, {'dialogue': '人家那儿一盘回锅肉,就三百四。', 'said_by': '小李'}, {'dialogue': '什么玩意?这么贵。', 'said_by': '小丁'}, {'dialogue': '你吃不了这么多。', 'said_by': '小李'}]}
{'summary': '- 彭蠡在家等待彭蠡回家,表现出内心的饥饿感和焦虑情绪,同时对彭蠡的行为表示不满和失望。彭蠡则对老刀的行为表现出冷漠和不屑的态度。', 'conversations': [{'dialogue': '我没时间和你解释。我需要去第一空间,你告诉我怎么走。', 'said_by': '老刀'}, {'dialogue': '回我家说,要走也从那儿走。', 'said_by': '彭蠡'}, {'dialogue': '回家啦,回家啦。转换马上开始了。', 'said_by': '车上的人'}, {'dialogue': '你不告诉我为什么,我就不告诉你怎么走。', 'said_by': '彭蠡'}, {'dialogue': '你躲在垃圾道里?去第二空间?那你得等24小时啊。', 'said_by': '彭蠡'}, {'dialogue': '二十万块。等一礼拜也值啊。', 'said_by': '老刀'}, {'dialogue': '你就这么缺钱花?', 'said_by': '彭蠡'}, {'dialogue': '糖糖还有一年多该去幼儿园了。我来不及了。', 'said_by': '老刀'}, {'dialogue': '你别说了。', 'said_by': '彭蠡'}]}
{'summary': '- 彭蠡对彭蠡的行为表现出不满和失望,同时对老刀的行为表现出冷漠和不屑的态度。', 'conversations': [{'dialogue': '你真是作死,她又不是你闺女,犯得着吗。', 'said_by': '彭蠡'}, {'dialogue': '别说这些了。快告我怎么走。', 'said_by': '老刀'}, {'dialogue': '你可得知道,万一被抓着,可不只是罚款,得关上好几个月。', 'said_by': '彭蠡'}, {'dialogue': '你不是去过好多次吗?', 'said_by': '老刀'}, {'dialogue': '只有四次。第五次就被抓了。', 'said_by': '彭蠡'}, {'dialogue': '那也够了。我要是能去四次,抓一次也无所谓。', 'said_by': '老刀'}, {'dialogue': '别说了。你要是真想让我带你去,我就带你去。', 'said_by': '彭蠡'}]}
- 彭蠡不在家中,老刀感到担忧并等待着彭蠡回家的时间,同时观察周围环境和人们的消费行为,表现出内心的饥饿感和焦虑情绪。
小李 : 哎,你们知道那儿一盘回锅肉多少钱吗?
小丁 : 靠,菜里有沙子。
小李 : 人家那儿一盘回锅肉,就三百四。
小丁 : 什么玩意?这么贵。
小李 : 你吃不了这么多。
- 彭蠡在家等待彭蠡回家,表现出内心的饥饿感和焦虑情绪,同时对彭蠡的行为表示不满和失望。彭蠡则对老刀的行为表现出冷漠和不屑的态度。
老刀 : 我没时间和你解释。我需要去第一空间,你告诉我怎么走。
彭蠡 : 回我家说,要走也从那儿走。
车上的人 : 回家啦,回家啦。转换马上开始了。
彭蠡 : 你不告诉我为什么,我就不告诉你怎么走。
彭蠡 : 你躲在垃圾道里?去第二空间?那你得等24小时啊。
老刀 : 二十万块。等一礼拜也值啊。
彭蠡 : 你就这么缺钱花?
老刀 : 糖糖还有一年多该去幼儿园了。我来不及了。
彭蠡 : 你别说了。
- 彭蠡对彭蠡的行为表现出不满和失望,同时对老刀的行为表现出冷漠和不屑的态度。
彭蠡 : 你真是作死,她又不是你闺女,犯得着吗。
老刀 : 别说这些了。快告我怎么走。
彭蠡 : 你可得知道,万一被抓着,可不只是罚款,得关上好几个月。
老刀 : 你不是去过好多次吗?
彭蠡 : 只有四次。第五次就被抓了。
老刀 : 那也够了。我要是能去四次,抓一次也无所谓。
彭蠡 : 别说了。你要是真想让我带你去,我就带你去。
English Output Example
{'summary': "Snow-covered Paris, Kimura's workshop, artist and viewer engaging in conversation.", 'conversations': [{'dialogue': 'You should hear the stories they tell of you at the café. If Émile is to be believed, you arrived here as an ukiyo-e courtesan, nothing more than paper wrapped around a porcelain bowl. A painter—he will not say which of us it was, of course—bought the bowl and the print along with it.', 'said_by': 'Artist'}, {'dialogue': 'And the painter pulled me from the print with the sheer force of his imagination, I’m sure. Émile is a novelist and can hardly be trusted to give an accurate account. The reality of my conception is vastly more mundane, I assure you…though it does involve a courtesan.', 'said_by': 'Woman'}, {'dialogue': 'A grain of truth makes for the best fiction. nude, but leave the jewelry and the shoes. I’ll paint you on the chaise. We’ll have three hours in the proper light, and I will pay you four francs.', 'said_by': 'Artist'}, {'dialogue': 'Victorine gets five!', 'said_by': 'Woman'}, {'dialogue': 'Victorine is a redhead.', 'said_by': 'Artist'}, {'dialogue': 'My name is Mariko, by the way, but everyone calls me Mari.', 'said_by': 'Mariko'}]}
{'summary': "Snow-covered Paris, Kimura's workshop, artist and viewer engaged in conversation. Artist and viewer engage in intimate conversation and interaction.", 'conversations': [{'dialogue': 'I’m on the chaise', 'said_by': 'Artist'}, {'dialogue': 'Bring your left hip forward. No, not that far. Bend the leg a bit more, yes. Turn your head to face the canvas.', 'said_by': 'Artist'}, {'dialogue': 'Like a Manet', 'said_by': 'Artist'}, {'dialogue': 'Don’t like a model that talks while you work, huh?', 'said_by': 'Artist'}, {'dialogue': 'I don’t like being compared to other artists.', 'said_by': 'Artist'}, {'dialogue': 'Then you must paint me so well that I forget about the others.', 'said_by': 'Artist'}, {'dialogue': 'Tilt your head into the light. And look at me intently. Intently. As though I were the one naked on the chaise.', 'said_by': 'Artist'}, {'dialogue': 'You did better than I would have expected.', 'said_by': 'Artist'}, {'dialogue': 'There are other poses I could show you, if you like?', 'said_by': 'Artist'}, {'dialogue': 'But the sooner I get started on this portrait, the better.', 'said_by': 'Artist'}]}
{'summary': "Kimura's workshop, artist and viewer engaging in intimate conversation and interaction. Kimura responds with a strong, cold embrace, leading to a passionate physical exchange. Afterward, the artist falls asleep, leaving the narrator feeling incomplete and longing.", 'num': 14, 'conversations': [{'dialogue': 'I could show you other poses.', 'said_by': 'Kimura'}, {'dialogue': 'Yes.', 'said_by': 'Kimura'}, {'dialogue': 'See you tomorrow?', 'said_by': 'Artist'}]}
Snow-covered Paris, Kimura's workshop, artist and viewer engaging in conversation.
Artist : You should hear the stories they tell of you at the café. If Émile is to be believed, you arrived here as an ukiyo-e courtesan, nothing more than paper wrapped around a porcelain bowl. A painter—he will not say which of us it was, of course—bought the bowl and the print along with it.
Woman : And the painter pulled me from the print with the sheer force of his imagination, I’m sure. Émile is a novelist and can hardly be trusted to give an accurate account. The reality of my conception is vastly more mundane, I assure you…though it does involve a courtesan.
Artist : A grain of truth makes for the best fiction. nude, but leave the jewelry and the shoes. I’ll paint you on the chaise. We’ll have three hours in the proper light, and I will pay you four francs.
Woman : Victorine gets five!
Artist : Victorine is a redhead.
Mariko : My name is Mariko, by the way, but everyone calls me Mari.
Snow-covered Paris, Kimura's workshop, artist and viewer engaged in conversation. Artist and viewer engage in intimate conversation and interaction.
Artist : I’m on the chaise
Artist : Bring your left hip forward. No, not that far. Bend the leg a bit more, yes. Turn your head to face the canvas.
Artist : Like a Manet
Artist : Don’t like a model that talks while you work, huh?
Artist : I don’t like being compared to other artists.
Artist : Then you must paint me so well that I forget about the others.
Artist : Tilt your head into the light. And look at me intently. Intently. As though I were the one naked on the chaise.
Artist : You did better than I would have expected.
Artist : There are other poses I could show you, if you like?
Artist : But the sooner I get started on this portrait, the better.
Kimura's workshop, artist and viewer engaging in intimate conversation and interaction. Kimura responds with a strong, cold embrace, leading to a passionate physical exchange. Afterward, the artist falls asleep, leaving the narrator feeling incomplete and longing.
Kimura : I could show you other poses.
Kimura : Yes.
Artist : See you tomorrow?
📄 License
This project is licensed under the Apache - 2.0 license.
Bart Large Cnn
MIT
BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks
Text Generation English
B
facebook
3.8M
1,364
Parrot Paraphraser On T5
Parrot is a T5-based paraphrasing framework designed to accelerate the training of Natural Language Understanding (NLU) models through high-quality paraphrase generation for data augmentation.
Text Generation
Transformers

P
prithivida
910.07k
152
Distilbart Cnn 12 6
Apache-2.0
DistilBART is a distilled version of the BART model, specifically optimized for text summarization tasks, significantly improving inference speed while maintaining high performance.
Text Generation English
D
sshleifer
783.96k
278
T5 Base Summarization Claim Extractor
A T5-based model specialized in extracting atomic claims from summary texts, serving as a key component in summary factuality assessment pipelines.
Text Generation
Transformers English

T
Babelscape
666.36k
9
Unieval Sum
UniEval is a unified multidimensional evaluator for automatic evaluation of natural language generation tasks, supporting assessment across multiple interpretable dimensions.
Text Generation
Transformers

U
MingZhong
318.08k
3
Pegasus Paraphrase
Apache-2.0
A text paraphrasing model fine-tuned based on the PEGASUS architecture, capable of generating sentences with the same meaning but different expressions.
Text Generation
Transformers English

P
tuner007
209.03k
185
T5 Base Korean Summarization
This is a Korean text summarization model based on the T5 architecture, specifically designed for Korean text summarization tasks. It is trained on multiple Korean datasets by fine-tuning the paust/pko-t5-base model.
Text Generation
Transformers Korean

T
eenzeenee
148.32k
25
Pegasus Xsum
PEGASUS is a Transformer-based pretrained model specifically designed for abstractive text summarization tasks.
Text Generation English
P
google
144.72k
198
Bart Large Cnn Samsum
MIT
A dialogue summarization model based on the BART-large architecture, fine-tuned specifically for the SAMSum corpus, suitable for generating dialogue summaries.
Text Generation
Transformers English

B
philschmid
141.28k
258
Kobart Summarization
MIT
A Korean text summarization model based on the KoBART architecture, capable of generating concise summaries of Korean news articles.
Text Generation
Transformers Korean

K
gogamza
119.18k
12
Featured Recommended AI Models