Edit model card

xsum_22457_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_22457_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 45
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - would - people - also 6 -1_said_mr_would_people
0 win - kick - game - foul - united 1243 0_win_kick_game_foul
1 health - patient - nhs - hospital - cancer 453 1_health_patient_nhs_hospital
2 film - actor - music - song - star 119 2_film_actor_music_song
3 bank - business - share - market - sale 83 3_bank_business_share_market
4 police - northern - ireland - said - crime 80 4_police_northern_ireland_said
5 wicket - england - cricket - test - bowler 67 5_wicket_england_cricket_test
6 president - mr - election - government - farc 67 6_president_mr_election_government
7 labour - party - mr - election - corbyn 62 7_labour_party_mr_election
8 bird - specie - animal - zoo - dna 58 8_bird_specie_animal_zoo
9 school - education - student - teacher - schools 48 9_school_education_student_teacher
10 murder - court - mr - police - said 42 10_murder_court_mr_police
11 crash - police - road - died - collision 42 11_crash_police_road_died
12 rail - transport - said - passenger - train 38 12_rail_transport_said_passenger
13 facebook - console - broadband - game - company 38 13_facebook_console_broadband_game
14 lifeboat - rnli - water - sea - hms 37 14_lifeboat_rnli_water_sea
15 fire - blaze - said - cladding - building 35 15_fire_blaze_said_cladding
16 russia - syria - russian - syrian - military 34 16_russia_syria_russian_syrian
17 girl - child - abuse - court - sexual 32 17_girl_child_abuse_court
18 trump - mr - president - trumps - clinton 29 18_trump_mr_president_trumps
19 man - police - arrested - suspicion - hospital 27 19_man_police_arrested_suspicion
20 murray - tennis - djokovic - wimbledon - grand 26 20_murray_tennis_djokovic_wimbledon
21 medal - gold - olympic - games - world 25 21_medal_gold_olympic_games
22 india - indian - crop - modi - hindu 24 22_india_indian_crop_modi
23 birdie - open - round - golf - mcilroy 23 23_birdie_open_round_golf
24 earth - particle - space - moon - dark 20 24_earth_particle_space_moon
25 madrid - barcelona - foul - assisted - corner 20 25_madrid_barcelona_foul_assisted
26 eu - uk - brexit - european - would 20 26_eu_uk_brexit_european
27 athlete - doping - ioc - olympic - medal 19 27_athlete_doping_ioc_olympic
28 wales - welsh - government - waste - money 18 28_wales_welsh_government_waste
29 race - rosberg - hamilton - mercedes - engine 16 29_race_rosberg_hamilton_mercedes
30 plane - flight - mh370 - aircraft - airlines 16 30_plane_flight_mh370_aircraft
31 fight - pacquiao - mayweather - champion - whyte 14 31_fight_pacquiao_mayweather_champion
32 attack - us - security - bin - killed 14 32_attack_us_security_bin
33 virus - ebola - outbreak - disease - infected 12 33_virus_ebola_outbreak_disease
34 greece - migrant - eu - greek - crisis 12 34_greece_migrant_eu_greek
35 hie - farm - enterprise - energy - funicular 12 35_hie_farm_enterprise_energy
36 inflation - growth - rate - economist - manufacturing 11 36_inflation_growth_rate_economist
37 yn - ar - bod - ei - wedi 11 37_yn_ar_bod_ei
38 cup - group - sredojevic - al - mazembe 11 38_cup_group_sredojevic_al
39 picasso - picture - image - collection - cameron 9 39_picasso_picture_image_collection
40 froome - sky - tour - wiggins - team 8 40_froome_sky_tour_wiggins
41 carnival - event - pride - lgbt - notting 7 41_carnival_event_pride_lgbt
42 cocaine - corkindale - supply - connelly - drug 6 42_cocaine_corkindale_supply_connelly
43 meal - child - school - family - scheme 6 43_meal_child_school_family

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2