Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published 19 days ago • 25
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 22 items • Updated 3 days ago • 302
Llamafied Models Collection This is a collection of llamafied models - such as Qwen. • 5 items • Updated Apr 19 • 1