Chat Topics
A BERTopic-based chat topic classification model capable of automatically identifying and categorizing topics from massive text data
Downloads 262
Release Time : 5/29/2023
Model Overview
This model is built using the BERTopic framework, specifically designed for analyzing chat texts and extracting meaningful topic classifications. Suitable for topic mining and analysis in scenarios such as social media and customer service dialogues.
Model Features
Modular Design
Adopts BERTopic's modular framework, allowing flexible adjustments to various processing stages
Multi-topic Identification
Capable of automatically identifying 75 different topic categories
Keyword Extraction
Provides the most representative keywords for each topic
Large-scale Training
Trained on 63,530 documents, covering a wide range of topics
Model Capabilities
Text Classification
Topic Identification
Keyword Extraction
Topic Visualization
Use Cases
Social Media Analysis
Chat Topic Monitoring
Analyzing trending discussion topics on social media
Identifies 75 different topic categories
Customer Service
Customer Service Dialogue Analysis
Classifying main topics of customer inquiries
Improves customer service response efficiency
tags:
- bertopic library_name: bertopic pipeline_tag: text-classification license: mit datasets:
- OpenAssistant/oasst1 language:
- en
chat_topics
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/chat_topics")
topic_model.get_topic_info()
Topic overview
- Number of topics: 75
- Number of training documents: 63530
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | provide - using - information - sure - help | 26 | -1_provide_using_information_sure |
0 | openai - ai - chatgpt - assistant - language | 7837 | Generative AI |
1 | anytime - welcome - assistance - helpful - thank | 1342 | 1_anytime_welcome_assistance_helpful |
2 | quantum - particle - physics - particles - relativity | 778 | Physics |
3 | story - lived - life - novel - felt | 569 | 3_story_lived_life_novel |
4 | letter - sincerely - regards - email - dear | 516 | 4_letter_sincerely_regards_email |
5 | rust - haskell - programming - java - languages | 504 | programming |
6 | css - html - style - div - js | 494 | web programming |
7 | linux - ubuntu - debian - fedora - install | 440 | 7_linux_ubuntu_debian_fedora |
8 | recipe - bake - ingredients - baking - dough | 425 | 8_recipe_bake_ingredients_baking |
9 | websocket - json - socket - api - discord | 425 | 9_websocket_json_socket_api |
10 | communism - capitalism - marx - economic - economy | 424 | 10_communism_capitalism_marx_economic |
11 | dog - pet - breed - breeds - pets | 408 | 11_dog_pet_breed_breeds |
12 | philosophy - theological - philosophical - beliefs - consciousness | 394 | 12_philosophy_theological_philosophical_beliefs |
13 | git - github - repository - software - commit | 381 | 13_git_github_repository_software |
14 | music - songs - musical - lyrics - song | 370 | 14_music_songs_musical_lyrics |
15 | devops - development - developers - industry - develop | 323 | 15_devops_development_developers_industry |
16 | pythagorean - hypotenuse - triangle - math - sqrt | 302 | 16_pythagorean_hypotenuse_triangle_math |
17 | eu - europe - economy - economic - war | 291 | 17_eu_europe_economy_economic |
18 | sleep - asleep - bedtime - procrastination - depression | 280 | 18_sleep_asleep_bedtime_procrastination |
19 | kramer - seinfeld - jerry - cafe - elaine | 279 | 19_kramer_seinfeld_jerry_cafe |
20 | printing - prints - printer - print - printers | 276 | 20_printing_prints_printer_print |
21 | influenza - flu - panic - symptoms - medical | 251 | 21_influenza_flu_panic_symptoms |
22 | chess - chessboard - practice - strategy - learn | 242 | 22_chess_chessboard_practice_strategy |
23 | algorithm - primes - array - integers - python | 240 | 23_algorithm_primes_array_integers |
24 | youtube - viewers - media - google - streaming | 240 | 24_youtube_viewers_media_google |
25 | poison - chemicals - powder - turpentine - smoke | 226 | 25_poison_chemicals_powder_turpentine |
26 | monday - sunday - count_weekend_days - calendar - dates | 216 | 26_monday_sunday_count_weekend_days_calendar |
27 | colors - colour - color - pigments - blue | 208 | 27_colors_colour_color_pigments |
28 | roman - attila - rome - empire - warfare | 205 | 28_roman_attila_rome_empire |
29 | investing - investments - investment - stocks - financial | 204 | 29_investing_investments_investment_stocks |
30 | vocabulary - wordle - words - scrabble - word | 201 | 30_vocabulary_wordle_words_scrabble |
31 | planets - sun - earth - planet - pluto | 198 | 31_planets_sun_earth_planet |
32 | renewable - solar - electricity - energy - electrical | 190 | 32_renewable_solar_electricity_energy |
33 | pygame - ball_radius - draw - circle - canvas | 181 | 33_pygame_ball_radius_draw_circle |
34 | fishing - fish - boat - hiking - camping | 176 | 34_fishing_fish_boat_hiking |
35 | gpus - gpu - motherboard - cpu - hardware | 162 | 35_gpus_gpu_motherboard_cpu |
36 | hvac - remodeling - energy - kwh - housing | 159 | 36_hvac_remodeling_energy_kwh |
37 | database - graphql - databases - postgresql - sql | 159 | 37_database_graphql_databases_postgresql |
38 | información - significado - cómo - como - sistemas | 158 | 38_información_significado_cómo_como |
39 | motherboard - pcie - gpu - bios - computer | 153 | 39_motherboard_pcie_gpu_bios |
40 | crops - produce - planting - peppers - plants | 148 | 40_crops_produce_planting_peppers |
41 | paintings - art - modernist - artists - modern | 148 | 41_paintings_art_modernist_artists |
42 | workout - exercises - dumbbells - dumbbell - exercise | 147 | 42_workout_exercises_dumbbells_dumbbell |
43 | climate - warming - pollution - environmental - emissions | 142 | 43_climate_warming_pollution_environmental |
44 | coffee - espresso - brewing - tea - beans | 137 | 44_coffee_espresso_brewing_tea |
45 | velocity - drag - acceleration - density - formula | 132 | 45_velocity_drag_acceleration_density |
46 | woodchuck - woodchucks - units - kilogram - kilograms | 130 | 46_woodchuck_woodchucks_units_kilogram |
47 | ascii - glyphs - hiragana - art - font | 129 | 47_ascii_glyphs_hiragana_art |
48 | guitars - guitar - strings - guitarists - instrument | 127 | 48_guitars_guitar_strings_guitarists |
49 | tallest - buildings - building - burj - khalifa | 114 | 49_tallest_buildings_building_burj |
50 | flat - earth - curvature - spherical - tectonic | 111 | 50_flat_earth_curvature_spherical |
51 | essay - awareness - understanding - being - be | 102 | 51_essay_awareness_understanding_being |
52 | portals - ender - portal - obsidian - netherite | 102 | 52_portals_ender_portal_obsidian |
53 | android - apple - phones - devices - vehicles | 101 | 53_android_apple_phones_devices |
54 | fasting - dietary - diet - eating - metabolic | 101 | 54_fasting_dietary_diet_eating |
55 | meditation - relief - pain - health - nociception | 99 | 55_meditation_relief_pain_health |
56 | weather - forecast - forecasts - raining - precipitation | 95 | 56_weather_forecast_forecasts_raining |
57 | president - presidents - presidency - constitution - biden | 94 | 57_president_presidents_presidency_constitution |
58 | no - nope - yes - not - maybe | 94 | 58_no_nope_yes_not |
59 | peregrine - airspeed - falcon - speed - bird | 90 | 59_peregrine_airspeed_falcon_speed |
60 | crontab - cron - myscript - script - bash | 83 | 60_crontab_cron_myscript_script |
61 | youtuber - streamer - ceo - musk - founder | 83 | 61_youtuber_streamer_ceo_musk |
62 | layovers - flights - circumnavigate - layover - travel | 83 | 62_layovers_flights_circumnavigate_layover |
63 | keyboards - keyboard - switches - qwerty - types | 83 | 63_keyboards_keyboard_switches_qwerty |
64 | file_path_in_dir1 - file_path1 - csv_file - file_path_in_dir2 - file_path2 | 80 | 64_file_path_in_dir1_file_path1_csv_file_file_path_in_dir2 |
65 | pele - maradona - lebron - ronaldo - nba | 76 | 65_pele_maradona_lebron_ronaldo |
66 | alopecia - hairstyles - hairstyle - hair - scalp | 66 | 66_alopecia_hairstyles_hairstyle_hair |
67 | nginx - docker - kubernetes - proxy_pass - nodeport | 65 | 67_nginx_docker_kubernetes_proxy_pass |
68 | directories - directory - sudo - filesystem - folders | 62 | 68_directories_directory_sudo_filesystem |
69 | gps - map - geocaching - maps - armenia | 52 | 69_gps_map_geocaching_maps |
70 | meiosis - mitosis - fertilization - reproduction - ovulation | 51 | 70_meiosis_mitosis_fertilization_reproduction |
71 | colleges - admissions - universities - campus - university | 43 | 71_colleges_admissions_universities_campus |
72 | unicorns - unicorn - pony - ponies - mythical | 32 | 72_unicorns_unicorn_pony_ponies |
73 | superpowers - abilities - superhero - superhuman - powers | 28 | 73_superpowers_abilities_superhero_superhuman |
Training hyperparameters
- calculate_probabilities: False
- language: None
- low_memory: False
- min_topic_size: 20
- n_gram_range: (1, 1)
- nr_topics: 75
- seed_topic_list: None
- top_n_words: 10
- verbose: True
Framework versions
- Numpy: 1.22.4
- HDBSCAN: 0.8.29
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.29.2
- Numba: 0.56.4
- Plotly: 5.13.1
- Python: 3.10.11
Distilbert Base Uncased Finetuned Sst 2 English
Apache-2.0
Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy
Text Classification English
D
distilbert
5.2M
746
Xlm Roberta Base Language Detection
MIT
Multilingual detection model based on XLM-RoBERTa, supporting text classification in 20 languages
Text Classification
Transformers Supports Multiple Languages

X
papluca
2.7M
333
Roberta Hate Speech Dynabench R4 Target
This model improves online hate detection through dynamic dataset generation, focusing on learning from worst-case scenarios to enhance detection effectiveness.
Text Classification
Transformers English

R
facebook
2.0M
80
Bert Base Multilingual Uncased Sentiment
MIT
A multilingual sentiment analysis model fine-tuned based on bert-base-multilingual-uncased, supporting sentiment analysis of product reviews in 6 languages
Text Classification Supports Multiple Languages
B
nlptown
1.8M
371
Emotion English Distilroberta Base
A fine-tuned English text emotion classification model based on DistilRoBERTa-base, capable of predicting Ekman's six basic emotions and neutral category.
Text Classification
Transformers English

E
j-hartmann
1.1M
402
Robertuito Sentiment Analysis
Spanish tweet sentiment analysis model based on RoBERTuito, supporting POS(positive)/NEG(negative)/NEU(neutral) three-class sentiment classification
Text Classification Spanish
R
pysentimiento
1.0M
88
Finbert Tone
FinBERT is a BERT model pre-trained on financial communication texts, specializing in the field of financial natural language processing. finbert-tone is its fine-tuned version for financial sentiment analysis tasks.
Text Classification
Transformers English

F
yiyanghkust
998.46k
178
Roberta Base Go Emotions
MIT
A multi-label sentiment classification model based on RoBERTa-base, trained on the go_emotions dataset, supporting recognition of 28 emotion labels.
Text Classification
Transformers English

R
SamLowe
848.12k
565
Xlm Emo T
XLM-EMO is a multilingual sentiment analysis model fine-tuned based on the XLM-T model, supporting 19 languages and specifically designed for sentiment prediction in social media texts.
Text Classification
Transformers Other

X
MilaNLProc
692.30k
7
Deberta V3 Base Mnli Fever Anli
MIT
DeBERTa-v3 model trained on MultiNLI, Fever-NLI, and ANLI datasets, excelling in zero-shot classification and natural language inference tasks
Text Classification
Transformers English

D
MoritzLaurer
613.93k
204
Featured Recommended AI Models