🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Organization Card
About org cards
🐝📊💁
Collections
6
smol_llama 220M fine-tunes we did
-
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation • Updated • 4.32k • 2 -
BEE-spoke-data/smol_llama-220M-open_instruct
Text Generation • Updated • 2.19k • 1 -
BEE-spoke-data/beecoder-220M-python
Text Generation • Updated • 27 • 2 -
BEE-spoke-data/zephyr-220m-sft-full
Text Generation • Updated • 3.49k • 1
spaces
1
models
39
BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw
Text Generation
•
Updated
•
59
•
1
BEE-spoke-data/Meta-Llama-3-8Bee
Text Generation
•
Updated
•
1.26k
BEE-spoke-data/claude-tokenizer
Updated
BEE-spoke-data/TinyLlama-3T-1.1bee
Text Generation
•
Updated
•
2.12k
•
2
BEE-spoke-data/bert-plus-L8-v1.0-allNLI_matryoshka
Sentence Similarity
•
Updated
•
1
BEE-spoke-data/bert-plus-L8-v1.0-synthSTSv3-4k
Sentence Similarity
•
Updated
•
5
BEE-spoke-data/mega-encoder-small-16k-v1
Fill-Mask
•
Updated
•
12
•
4
BEE-spoke-data/mega-small-embed-synthSTS-16384-v1
Sentence Similarity
•
Updated
•
7
•
4
BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k
Sentence Similarity
•
Updated
•
4
•
3
BEE-spoke-data/smol_llama-220M-openhermes
Text Generation
•
Updated
•
4.32k
•
2
datasets
50
BEE-spoke-data/beeweb-5k
Viewer
•
Updated
BEE-spoke-data/fineweb-synergy-20k
Updated
BEE-spoke-data/FineMeme-100k
Viewer
•
Updated
BEE-spoke-data/SaunaWeb-50k
Viewer
•
Updated
BEE-spoke-data/UltraTextbooks-2.1-fw_mix
Viewer
•
Updated
•
36
•
2
BEE-spoke-data/rp_books-en
Viewer
•
Updated
•
6
•
1
BEE-spoke-data/gutenberg-en-v1-clean
Viewer
•
Updated
•
36
•
2
BEE-spoke-data/napierone-epub-raw
Viewer
•
Updated
•
21
BEE-spoke-data/napierone-pdf-raw
Viewer
•
Updated
•
4
BEE-spoke-data/fineweb-1000_64k
Viewer
•
Updated