Edit model card

cnn_dailymail_108_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 51
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - people - year - would 10 -1_said_one_people_year
0 league - player - cup - club - game 954 0_league_player_cup_club
1 police - said - court - told - murder 308 1_police_said_court_told
2 dog - animal - cat - elephant - zoo 290 2_dog_animal_cat_elephant
3 mr - minister - labour - cameron - prime 113 3_mr_minister_labour_cameron
4 obama - clinton - president - republican - campaign 104 4_obama_clinton_president_republican
5 school - teacher - student - nfl - said 84 5_school_teacher_student_nfl
6 food - milk - drink - wine - bottle 72 6_food_milk_drink_wine
7 flight - plane - passenger - pilot - aircraft 49 7_flight_plane_passenger_pilot
8 user - facebook - google - ipad - device 48 8_user_facebook_google_ipad
9 olympic - gold - race - games - medal 46 9_olympic_gold_race_games
10 doll - dress - fashion - look - style 44 10_doll_dress_fashion_look
11 afghan - afghanistan - taliban - military - pakistan 43 11_afghan_afghanistan_taliban_military
12 transplant - patient - heart - hospital - cancer 42 12_transplant_patient_heart_hospital
13 iran - syrian - said - president - egypt 42 13_iran_syrian_said_president
14 show - film - million - like - movie 39 14_show_film_million_like
15 property - house - price - home - apartment 38 15_property_house_price_home
16 earth - asteroid - moon - volcano - planet 34 16_earth_asteroid_moon_volcano
17 federer - djokovic - match - murray - seed 33 17_federer_djokovic_match_murray
18 jackson - jacksons - album - song - music 31 18_jackson_jacksons_album_song
19 ship - boat - coast - said - vessel 30 19_ship_boat_coast_said
20 russia - russian - putin - ukraine - moscow 30 20_russia_russian_putin_ukraine
21 snow - weather - temperature - climate - water 29 21_snow_weather_temperature_climate
22 police - station - mr - man - gang 28 22_police_station_mr_man
23 ebola - disease - vaccine - virus - health 28 23_ebola_disease_vaccine_virus
24 weight - fat - diet - burn - exercise 28 24_weight_fat_diet_burn
25 syria - isis - islamic - muslims - alqudsi 23 25_syria_isis_islamic_muslims
26 boko - haram - nigeria - nigerian - turkana 23 26_boko_haram_nigeria_nigerian
27 korea - north - korean - kim - pyongyang 22 27_korea_north_korean_kim
28 driver - driving - road - car - speed 22 28_driver_driving_road_car
29 school - child - education - internet - english 21 29_school_child_education_internet
30 mcilroy - woods - pga - tournament - round 20 30_mcilroy_woods_pga_tournament
31 race - car - driver - team - f1 19 31_race_car_driver_team
32 princess - prince - diana - royal - palace 18 32_princess_prince_diana_royal
33 climbing - climb - mountain - everest - ang 18 33_climbing_climb_mountain_everest
34 wedding - bieber - couple - together - love 18 34_wedding_bieber_couple_together
35 nhs - care - patient - hospital - health 17 35_nhs_care_patient_hospital
36 iraq - iraqi - isis - baghdad - kurdish 16 36_iraq_iraqi_isis_baghdad
37 cartel - drug - mexican - mexico - crack 15 37_cartel_drug_mexican_mexico
38 painting - picasso - art - artist - gogh 15 38_painting_picasso_art_artist
39 castro - zelaya - fidel - micheletti - president 14 39_castro_zelaya_fidel_micheletti
40 french - ford - traveller - southampton - taxi 14 40_french_ford_traveller_southampton
41 fire - florissant - bell - firefighter - burned 14 41_fire_florissant_bell_firefighter
42 fight - ali - heavyweight - pacquiao - title 13 42_fight_ali_heavyweight_pacquiao
43 fish - sea - jellyfish - manta - swell 13 43_fish_sea_jellyfish_manta
44 pope - francis - vatican - falkland - islands 12 44_pope_francis_vatican_falkland
45 gay - samesex - lgbt - marriage - state 12 45_gay_samesex_lgbt_marriage
46 castle - tower - building - brent - lego 12 46_castle_tower_building_brent
47 chinese - china - xinhua - chinas - communist 12 47_chinese_china_xinhua_chinas
48 delivery - customer - market - vacuum - coin 10 48_delivery_customer_market_vacuum
49 water - rain - storm - flooding - methane 10 49_water_rain_storm_flooding

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
3