Edit model card

xsum_55555_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_55555_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 54
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - would - mr - year 6 -1_said_people_would_mr
0 party - eu - labour - vote - brexit 1465 0_party_eu_labour_vote
1 trump - mr - president - republican - russia 129 1_trump_mr_president_republican
2 care - health - nhs - patient - hospital 76 2_care_health_nhs_patient
3 syria - syrian - attack - killed - force 75 3_syria_syrian_attack_killed
4 cricket - wicket - england - test - ball 64 4_cricket_wicket_england_test
5 club - league - season - appearance - loan 59 5_club_league_season_appearance
6 wales - rugby - england - game - player 58 6_wales_rugby_england_game
7 film - show - actor - actress - star 55 7_film_show_actor_actress
8 medal - sport - olympic - gold - world 54 8_medal_sport_olympic_gold
9 driving - driver - crash - car - road 48 9_driving_driver_crash_car
10 chelsea - arsenal - city - goal - tottenham 44 10_chelsea_arsenal_city_goal
11 president - mr - petrobras - odebrecht - government 43 11_president_mr_petrobras_odebrecht
12 lifeboat - sea - rnli - ship - boat 41 12_lifeboat_sea_rnli_ship
13 crime - police - child - force - abuse 37 13_crime_police_child_force
14 man - police - men - wearing - arrested 35 14_man_police_men_wearing
15 murray - seed - match - slam - set 34 15_murray_seed_match_slam
16 dog - mountain - animal - avalanche - said 34 16_dog_mountain_animal_avalanche
17 court - sexual - assault - trial - woman 31 17_court_sexual_assault_trial
18 school - education - teacher - academy - pupil 30 18_school_education_teacher_academy
19 fifa - ghana - burkina - african - cup 29 19_fifa_ghana_burkina_african
20 music - album - song - like - im 28 20_music_album_song_like
21 fire - blaze - rescue - said - building 28 21_fire_blaze_rescue_said
22 energy - gas - shale - project - power 27 22_energy_gas_shale_project
23 train - rail - bridge - scotrail - strike 27 23_train_rail_bridge_scotrail
24 growth - rate - oil - market - us 26 24_growth_rate_oil_market
25 town - foul - box - footed - half 26 25_town_foul_box_footed
26 open - round - golf - par - birdie 26 26_open_round_golf_par
27 china - north - chinese - xi - taiwan 22 27_china_north_chinese_xi
28 bond - bank - greek - greece - eurozone 22 28_bond_bank_greek_greece
29 race - lap - second - honda - driver 21 29_race_lap_second_honda
30 president - mr - congolese - africa - african 21 30_president_mr_congolese_africa
31 barcelona - fc - madrid - de - bayern 19 31_barcelona_fc_madrid_de
32 murder - man - postmortem - court - found 18 32_murder_man_postmortem_court
33 welsh - wales - government - assembly - labour 17 33_welsh_wales_government_assembly
34 celtic - game - season - rangers - team 17 34_celtic_game_season_rangers
35 heritage - castle - house - orkney - building 17 35_heritage_castle_house_orkney
36 tax - deficit - debt - economy - financial 16 36_tax_deficit_debt_economy
37 stream - jet - weather - wind - flood 15 37_stream_jet_weather_wind
38 software - security - data - hacker - router 15 38_software_security_data_hacker
39 painting - portrait - art - collection - artist 14 39_painting_portrait_art_collection
40 apple - tablet - hp - firm - android 14 40_apple_tablet_hp_firm
41 robertson - mr - court - knife - murder 12 41_robertson_mr_court_knife
42 unsupported - device - updated - playback - media 12 42_unsupported_device_updated_playback
43 iaaf - doping - athlete - athletics - antidoping 11 43_iaaf_doping_athlete_athletics
44 stolen - theft - burglary - thief - store 11 44_stolen_theft_burglary_thief
45 yn - ar - mae - bod - ei 11 45_yn_ar_mae_bod
46 flight - plane - airport - aircraft - passenger 11 46_flight_plane_airport_aircraft
47 baby - child - infant - mcelhinney - church 10 47_baby_child_infant_mcelhinney
48 party - fillon - mr - socialist - macron 10 48_party_fillon_mr_socialist
49 serbia - scotland - celtic - throwin - kick 9 49_serbia_scotland_celtic_throwin
50 child - childcare - families - mental - nurse 8 50_child_childcare_families_mental
51 turkey - migrant - eu - visa - greece 6 51_turkey_migrant_eu_visa
52 supermarket - store - price - sale - tyrrells 6 52_supermarket_store_price_sale

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
3