Edit model card

cnn_dailymail_6789_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 54
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - one - police - year 10 -1_said_people_one_police
0 player - league - cup - club - game 1072 0_player_league_cup_club
1 police - said - death - murder - found 291 1_police_said_death_murder
2 obama - president - republicans - house - republican 152 2_obama_president_republicans_house
3 labour - mr - cameron - minister - prime 98 3_labour_mr_cameron_minister
4 hospital - baby - surgery - heart - doctor 77 4_hospital_baby_surgery_heart
5 iphone - apple - user - device - phone 74 5_iphone_apple_user_device
6 doll - fashion - look - collection - like 69 6_doll_fashion_look_collection
7 syria - isis - syrian - iraq - iraqi 46 7_syria_isis_syrian_iraq
8 pakistan - taliban - al - drone - afghanistan 45 8_pakistan_taliban_al_drone
9 food - restaurant - menu - burger - coffee 43 9_food_restaurant_menu_burger
10 car - driver - vehicle - crash - driving 41 10_car_driver_vehicle_crash
11 space - tower - car - airport - nasa 40 11_space_tower_car_airport
12 property - house - home - apartment - room 40 12_property_house_home_apartment
13 school - rape - sexual - student - sex 36 13_school_rape_sexual_student
14 nfl - rice - quarterback - said - coach 36 14_nfl_rice_quarterback_said
15 music - album - song - miley - cnn 33 15_music_album_song_miley
16 olympic - gold - olympics - athlete - world 33 16_olympic_gold_olympics_athlete
17 zoo - bear - tian - elephant - ivory 33 17_zoo_bear_tian_elephant
18 flight - plane - aircraft - pilot - airport 32 18_flight_plane_aircraft_pilot
19 flu - bacteria - vaccine - health - disease 31 19_flu_bacteria_vaccine_health
20 dog - animal - pet - cat - dogs 30 20_dog_animal_pet_cat
21 school - education - exam - child - degree 30 21_school_education_exam_child
22 kenya - kenyan - mall - said - nairobi 28 22_kenya_kenyan_mall_said
23 cent - per - price - cadbury - christmas 27 23_cent_per_price_cadbury
24 french - france - sarkozy - hollande - minister 26 24_french_france_sarkozy_hollande
25 russian - ukraine - russia - putin - ukrainian 25 25_russian_ukraine_russia_putin
26 iran - nuclear - iranian - israel - irans 24 26_iran_nuclear_iranian_israel
27 film - bond - novel - the - cnn 24 27_film_bond_novel_the
28 lava - fire - snow - pahoa - volcano 24 28_lava_fire_snow_pahoa
29 drug - mexican - chavez - cartel - said 23 29_drug_mexican_chavez_cartel
30 ship - vessel - captain - crew - coast 23 30_ship_vessel_captain_crew
31 snowden - us - intelligence - information - gebregeorgis 23 31_snowden_us_intelligence_information
32 match - wimbledon - federer - final - open 22 32_match_wimbledon_federer_final
33 chinese - china - beijing - hong - protester 21 33_chinese_china_beijing_hong
34 jury - white - ferguson - police - said 21 34_jury_white_ferguson_police
35 weather - temperature - rain - warm - park 21 35_weather_temperature_rain_warm
36 prince - royal - william - princess - queen 20 36_prince_royal_william_princess
37 weight - fat - diet - gym - size 19 37_weight_fat_diet_gym
38 golf - mcilroy - round - pga - championship 19 38_golf_mcilroy_round_pga
39 hamilton - race - rosberg - prix - button 19 39_hamilton_race_rosberg_prix
40 north - kim - korean - korea - koreas 18 40_north_kim_korean_korea
41 human - found - fossil - ancient - fish 18 41_human_found_fossil_ancient
42 climate - change - global - energy - wind 17 42_climate_change_global_energy
43 school - teacher - pupil - schools - ofsted 17 43_school_teacher_pupil_schools
44 ebola - virus - health - outbreak - liberia 17 44_ebola_virus_health_outbreak
45 whale - nyad - shark - swim - beach 17 45_whale_nyad_shark_swim
46 money - kallakis - foster - court - wines 15 46_money_kallakis_foster_court
47 painting - art - portrait - auction - artist 14 47_painting_art_portrait_auction
48 solar - planet - sun - bubble - earth 14 48_solar_planet_sun_bubble
49 tsarnaev - oswald - boston - marathon - kennedy 14 49_tsarnaev_oswald_boston_marathon
50 patient - care - va - hospital - patients 14 50_patient_care_va_hospital
51 love - woman - im - relationship - men 13 51_love_woman_im_relationship
52 marijuana - alcohol - drug - hangover - liver 11 52_marijuana_alcohol_drug_hangover

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
2