Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints 9 days ago • 43
Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Apr 8, 2021
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 99 • 5
Improved Baselines with Visual Instruction Tuning Paper • 2310.03744 • Published Oct 5, 2023 • 32 • 4
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20 • 45 • 2