Chat Topics

Developed by davanstrien

A BERTopic-based chat topic classification model capable of automatically identifying and categorizing topics from massive text data

Text Classification EnglishOpen Source License:MIT #Multi-domain Topic Classification #Semantic Clustering Analysis #English Text Processing

Downloads 262

Release Time : 5/29/2023

Model Overview

This model is built using the BERTopic framework, specifically designed for analyzing chat texts and extracting meaningful topic classifications. Suitable for topic mining and analysis in scenarios such as social media and customer service dialogues.

Model Features

Modular Design

Adopts BERTopic's modular framework, allowing flexible adjustments to various processing stages

Multi-topic Identification

Capable of automatically identifying 75 different topic categories

Keyword Extraction

Provides the most representative keywords for each topic

Large-scale Training

Trained on 63,530 documents, covering a wide range of topics

Model Capabilities

Text Classification

Topic Identification

Keyword Extraction

Topic Visualization

Use Cases

Social Media Analysis

Chat Topic Monitoring

Analyzing trending discussion topics on social media

Identifies 75 different topic categories

Customer Service

Customer Service Dialogue Analysis

Classifying main topics of customer inquiries

Improves customer service response efficiency

tags:

bertopic library_name: bertopic pipeline_tag: text-classification license: mit datasets:
OpenAssistant/oasst1 language:
en

chat_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/chat_topics")

topic_model.get_topic_info()

Topic overview

Number of topics: 75
Number of training documents: 63530

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	provide - using - information - sure - help	26	-1_provide_using_information_sure
0	openai - ai - chatgpt - assistant - language	7837	Generative AI
1	anytime - welcome - assistance - helpful - thank	1342	1_anytime_welcome_assistance_helpful
2	quantum - particle - physics - particles - relativity	778	Physics
3	story - lived - life - novel - felt	569	3_story_lived_life_novel
4	letter - sincerely - regards - email - dear	516	4_letter_sincerely_regards_email
5	rust - haskell - programming - java - languages	504	programming
6	css - html - style - div - js	494	web programming
7	linux - ubuntu - debian - fedora - install	440	7_linux_ubuntu_debian_fedora
8	recipe - bake - ingredients - baking - dough	425	8_recipe_bake_ingredients_baking
9	websocket - json - socket - api - discord	425	9_websocket_json_socket_api
10	communism - capitalism - marx - economic - economy	424	10_communism_capitalism_marx_economic
11	dog - pet - breed - breeds - pets	408	11_dog_pet_breed_breeds
12	philosophy - theological - philosophical - beliefs - consciousness	394	12_philosophy_theological_philosophical_beliefs
13	git - github - repository - software - commit	381	13_git_github_repository_software
14	music - songs - musical - lyrics - song	370	14_music_songs_musical_lyrics
15	devops - development - developers - industry - develop	323	15_devops_development_developers_industry
16	pythagorean - hypotenuse - triangle - math - sqrt	302	16_pythagorean_hypotenuse_triangle_math
17	eu - europe - economy - economic - war	291	17_eu_europe_economy_economic
18	sleep - asleep - bedtime - procrastination - depression	280	18_sleep_asleep_bedtime_procrastination
19	kramer - seinfeld - jerry - cafe - elaine	279	19_kramer_seinfeld_jerry_cafe
20	printing - prints - printer - print - printers	276	20_printing_prints_printer_print
21	influenza - flu - panic - symptoms - medical	251	21_influenza_flu_panic_symptoms
22	chess - chessboard - practice - strategy - learn	242	22_chess_chessboard_practice_strategy
23	algorithm - primes - array - integers - python	240	23_algorithm_primes_array_integers
24	youtube - viewers - media - google - streaming	240	24_youtube_viewers_media_google
25	poison - chemicals - powder - turpentine - smoke	226	25_poison_chemicals_powder_turpentine
26	monday - sunday - count_weekend_days - calendar - dates	216	26_monday_sunday_count_weekend_days_calendar
27	colors - colour - color - pigments - blue	208	27_colors_colour_color_pigments
28	roman - attila - rome - empire - warfare	205	28_roman_attila_rome_empire
29	investing - investments - investment - stocks - financial	204	29_investing_investments_investment_stocks
30	vocabulary - wordle - words - scrabble - word	201	30_vocabulary_wordle_words_scrabble
31	planets - sun - earth - planet - pluto	198	31_planets_sun_earth_planet
32	renewable - solar - electricity - energy - electrical	190	32_renewable_solar_electricity_energy
33	pygame - ball_radius - draw - circle - canvas	181	33_pygame_ball_radius_draw_circle
34	fishing - fish - boat - hiking - camping	176	34_fishing_fish_boat_hiking
35	gpus - gpu - motherboard - cpu - hardware	162	35_gpus_gpu_motherboard_cpu
36	hvac - remodeling - energy - kwh - housing	159	36_hvac_remodeling_energy_kwh
37	database - graphql - databases - postgresql - sql	159	37_database_graphql_databases_postgresql
38	información - significado - cómo - como - sistemas	158	38_información_significado_cómo_como
39	motherboard - pcie - gpu - bios - computer	153	39_motherboard_pcie_gpu_bios
40	crops - produce - planting - peppers - plants	148	40_crops_produce_planting_peppers
41	paintings - art - modernist - artists - modern	148	41_paintings_art_modernist_artists
42	workout - exercises - dumbbells - dumbbell - exercise	147	42_workout_exercises_dumbbells_dumbbell
43	climate - warming - pollution - environmental - emissions	142	43_climate_warming_pollution_environmental
44	coffee - espresso - brewing - tea - beans	137	44_coffee_espresso_brewing_tea
45	velocity - drag - acceleration - density - formula	132	45_velocity_drag_acceleration_density
46	woodchuck - woodchucks - units - kilogram - kilograms	130	46_woodchuck_woodchucks_units_kilogram
47	ascii - glyphs - hiragana - art - font	129	47_ascii_glyphs_hiragana_art
48	guitars - guitar - strings - guitarists - instrument	127	48_guitars_guitar_strings_guitarists
49	tallest - buildings - building - burj - khalifa	114	49_tallest_buildings_building_burj
50	flat - earth - curvature - spherical - tectonic	111	50_flat_earth_curvature_spherical
51	essay - awareness - understanding - being - be	102	51_essay_awareness_understanding_being
52	portals - ender - portal - obsidian - netherite	102	52_portals_ender_portal_obsidian
53	android - apple - phones - devices - vehicles	101	53_android_apple_phones_devices
54	fasting - dietary - diet - eating - metabolic	101	54_fasting_dietary_diet_eating
55	meditation - relief - pain - health - nociception	99	55_meditation_relief_pain_health
56	weather - forecast - forecasts - raining - precipitation	95	56_weather_forecast_forecasts_raining
57	president - presidents - presidency - constitution - biden	94	57_president_presidents_presidency_constitution
58	no - nope - yes - not - maybe	94	58_no_nope_yes_not
59	peregrine - airspeed - falcon - speed - bird	90	59_peregrine_airspeed_falcon_speed
60	crontab - cron - myscript - script - bash	83	60_crontab_cron_myscript_script
61	youtuber - streamer - ceo - musk - founder	83	61_youtuber_streamer_ceo_musk
62	layovers - flights - circumnavigate - layover - travel	83	62_layovers_flights_circumnavigate_layover
63	keyboards - keyboard - switches - qwerty - types	83	63_keyboards_keyboard_switches_qwerty
64	file_path_in_dir1 - file_path1 - csv_file - file_path_in_dir2 - file_path2	80	64_file_path_in_dir1_file_path1_csv_file_file_path_in_dir2
65	pele - maradona - lebron - ronaldo - nba	76	65_pele_maradona_lebron_ronaldo
66	alopecia - hairstyles - hairstyle - hair - scalp	66	66_alopecia_hairstyles_hairstyle_hair
67	nginx - docker - kubernetes - proxy_pass - nodeport	65	67_nginx_docker_kubernetes_proxy_pass
68	directories - directory - sudo - filesystem - folders	62	68_directories_directory_sudo_filesystem
69	gps - map - geocaching - maps - armenia	52	69_gps_map_geocaching_maps
70	meiosis - mitosis - fertilization - reproduction - ovulation	51	70_meiosis_mitosis_fertilization_reproduction
71	colleges - admissions - universities - campus - university	43	71_colleges_admissions_universities_campus
72	unicorns - unicorn - pony - ponies - mythical	32	72_unicorns_unicorn_pony_ponies
73	superpowers - abilities - superhero - superhuman - powers	28	73_superpowers_abilities_superhero_superhuman

Training hyperparameters

calculate_probabilities: False
language: None
low_memory: False
min_topic_size: 20
n_gram_range: (1, 1)
nr_topics: 75
seed_topic_list: None
top_n_words: 10
verbose: True

Framework versions

Numpy: 1.22.4
HDBSCAN: 0.8.29
UMAP: 0.5.3
Pandas: 1.5.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.2.2
Transformers: 4.29.2
Numba: 0.56.4
Plotly: 5.13.1
Python: 3.10.11

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご