afrideva (afrideva)

upvoted a paper 15 days ago

Arcee's MergeKit: A Toolkit for Merging Large Language Models

Paper • 2403.13257 • Published Mar 20 • 17

upvoted 2 collections 4 months ago

Medical Evaluation Datasets

Collection

39 items • Updated Apr 24 • 4

Pretrained Text-Generation Models Below 250M Parameters

Collection

Great candidates for fine-tuning targeting Transformers.js, ordered by number of parameters. • 7 items • Updated 21 days ago • 6

upvoted 2 papers 4 months ago

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16 • 27

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18 • 135

upvoted 2 collections 5 months ago

LLM Leaderboard best models ❤️‍🔥

Collection

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 76 items • Updated 3 days ago • 314

Trained Models 🏋️

Collection

They may be small, but they're training like giants! • 8 items • Updated 21 days ago • 14

upvoted 2 papers 6 months ago

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Paper • 2310.08185 • Published Oct 12, 2023 • 6

TinyGSM: achieving >80% on GSM8k with small language models

Paper • 2312.09241 • Published Dec 14, 2023 • 34

upvoted 6 collections 6 months ago

upvoted 4 collections 7 months ago

Cramp(ed) Models

Collection

Smaller models trained locally on my 2xA6000 Lambda Vector • 3 items • Updated Oct 10, 2023 • 1

Shrink Llama - V1

Collection

Parts of Meta's LlamaV2 models, chopped up and trained. CoreX means the first X layers were kept. • 2 items • Updated Sep 12, 2023 • 2

GPT2-Linear

Collection

GPT2 Models using Linear layers instead of Conv layers for convenience. • 6 items • Updated Sep 9, 2023 • 1

read papers

Collection

This is a collection of some papers I've read in the past few months • 10 items • Updated Nov 21, 2023 • 45

upvoted a paper 7 months ago

Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 17

upvoted a collection 7 months ago

KAI Large Language Models

Collection

All of the KAI LLMs in one collection. The KAI models are a series of lightweight LLMs ranging from 1 Billion parameters to 7 Billion parameters • 5 items • Updated Nov 14, 2023 • 2

upvoted a paper 7 months ago

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 43

upvoted 2 collections 7 months ago

Recent models: last 100 repos, sorted by creation date

Collection

The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 450

TinyKAI Large Language Models

Collection

All of the TinyKAI LLMs in one collection. The TinyKAI models are a series of extremely lightweight LLMs under 5 Billion parameters. • 3 items • Updated Nov 14, 2023 • 2

upvoted 17 papers 7 months ago

Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf

Paper • 2309.04658 • Published Sep 9, 2023 • 2

On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models

Paper • 2307.09793 • Published Jul 19, 2023 • 45

Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models

Paper • 2310.20499 • Published Oct 31, 2023 • 7

Beyond U: Making Diffusion Models Faster & Lighter

Paper • 2310.20092 • Published Oct 31, 2023 • 11

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 15

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Paper • 2310.19909 • Published Oct 30, 2023 • 19

CapsFusion: Rethinking Image-Text Data at Scale

Paper • 2310.20550 • Published Oct 31, 2023 • 25

ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation

Paper • 2311.00272 • Published Nov 1, 2023 • 8

Text Rendering Strategies for Pixel Language Models

Paper • 2311.00522 • Published Nov 1, 2023 • 10

Controllable Music Production with Diffusion Models and Guidance Gradients

Paper • 2311.00613 • Published Nov 1, 2023 • 23

De-Diffusion Makes Text a Strong Cross-Modal Interface

Paper • 2311.00618 • Published Nov 1, 2023 • 21

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 39

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 53

Idempotent Generative Network

Paper • 2311.01462 • Published Nov 2, 2023 • 22

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

Paper • 2311.00945 • Published Nov 2, 2023 • 11

FLAP: Fast Language-Audio Pre-training

Paper • 2311.01615 • Published Nov 2, 2023 • 16

Teaching Arithmetic to Small Transformers

Paper • 2307.03381 • Published Jul 7, 2023 • 15

afrideva

AI & ML interests

Organizations

afrideva's activity