Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper • 2403.13257 • Published Mar 20 • 17
Pretrained Text-Generation Models Below 250M Parameters Collection Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 7 items • Updated 21 days ago • 6
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 27
LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 76 items • Updated 3 days ago • 314
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated 21 days ago • 14
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation Paper • 2310.08185 • Published Oct 12, 2023 • 6
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 34
ChatGPT-Mini Collection A collection of fine-tuned GPT-2 models each designed to deploy a ChatGPT-like model at home. These models can also be deployed on an old computer. • 8 items • Updated Nov 16, 2023 • 3
smol llama Collection 🚧"raw" pretrained smol_llama checkpoints - WIP 🚧 • 4 items • Updated Apr 29 • 5
Indic language fine-tunes Collection Halted State: Attempting to create acceptable quality fine-tunes of different models • 1 item • Updated Nov 23, 2023 • 1
PIC (Partner-in-Crime) project Collection Empathetic, small, really useful personalised models. • 3 items • Updated Dec 10, 2023 • 2
Cramp(ed) Models Collection Smaller models trained locally on my 2xA6000 Lambda Vector • 3 items • Updated Oct 10, 2023 • 1
Shrink Llama - V1 Collection Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept. • 2 items • Updated Sep 12, 2023 • 2
GPT2-Linear Collection GPT2 Models using Linear layers instead of Conv layers for convenience. • 6 items • Updated Sep 9, 2023 • 1
read papers Collection This is a collection of some papers I've read in the past few months • 10 items • Updated Nov 21, 2023 • 45
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 17
KAI Large Language Models Collection All of the KAI LLMs in one collection. The KAI models are a series of lightweight LLMs ranging from 1 Billion parameters to 7 Billion parameters • 5 items • Updated Nov 14, 2023 • 2
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 43
Recent models: last 100 repos, sorted by creation date Collection The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 450
TinyKAI Large Language Models Collection All of the TinyKAI LLMs in one collection. The TinyKAI models are a series of extremely lightweight LLMs under 5 Billion parameters. • 3 items • Updated Nov 14, 2023 • 2
Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf Paper • 2309.04658 • Published Sep 9, 2023 • 2
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models Paper • 2307.09793 • Published Jul 19, 2023 • 45
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models Paper • 2310.20499 • Published Oct 31, 2023 • 7
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning Paper • 2310.20587 • Published Oct 31, 2023 • 15
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 19
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation Paper • 2311.00272 • Published Nov 1, 2023 • 8
Controllable Music Production with Diffusion Models and Guidance Gradients Paper • 2311.00613 • Published Nov 1, 2023 • 23
De-Diffusion Makes Text a Strong Cross-Modal Interface Paper • 2311.00618 • Published Nov 1, 2023 • 21
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 39
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 53
E3 TTS: Easy End-to-End Diffusion-based Text to Speech Paper • 2311.00945 • Published Nov 2, 2023 • 11