Pedro Cuenca's picture

Pedro Cuenca

pcuenq

·

pcuenq

pcuenca

AI & ML interests

None yet

Articles

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

Welcome Gemma - Google's new open LLM

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

SDXL in 4 steps with Latent Consistency LoRAs

Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

Inference for PROs

Introducing Würstchen: Fast Diffusion for Image Generation

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Releasing Swift Transformers: Run On-Device LLMs in Apple Devices

Stable Diffusion XL on Mac with Advanced Core ML Quantization

Happy 1st anniversary 🤗 Diffusers!

Llama 2 is here - get it on Hugging Face

Faster Stable Diffusion with Core ML on iPhone, iPad, and Mac

The Falcon has landed in the Hugging Face ecosystem

Train your ControlNet with diffusers

Swift Diffusers: Fast Stable Diffusion for Mac

Using LoRA for Efficient Stable Diffusion Fine-Tuning

Using Stable Diffusion with Core ML on Apple Silicon

Hugging Face Machine Learning Demos on arXiv

Training Stable Diffusion with Dreambooth using 🧨 Diffusers

Stable Diffusion in JAX/Flax 🚀

Stable Diffusion with 🧨 Diffusers

Organizations

Posts 1

Post

542

OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!

Collections 4

spaces 8

Gguf It

Quarto Template

ControlNet Uncanny Faces

Persistent Chatroom

Paella

Lora Pokemon

models 71

pcuenq/tiny-gemma-test5

Feature Extraction • Updated 27 days ago • 4

pcuenq/tiny-gemma-test4

Feature Extraction • Updated 27 days ago • 4

pcuenq/tiny-gemma-test3

Feature Extraction • Updated 27 days ago • 4

pcuenq/tiny-gemma-tes2t

Updated 27 days ago

pcuenq/tiny-gemma-test2

Feature Extraction • Updated 27 days ago • 4

pcuenq/tiny-gemma-test

Feature Extraction • Updated 27 days ago • 4

pcuenq/tiny-gemma

Feature Extraction • Updated 27 days ago • 4

pcuenq/TinyLlama-1.1B-Chat-v1.0-Q4_K_M-GGUF

Updated Mar 29 • 11

pcuenq/Mistral-7B-v0.1-gguf

pcuenq/tiny-llama-chat-mlx

Text Generation • Updated Mar 8 • 9

datasets 15

pcuenq/misc-assets

Viewer • Updated about 11 hours ago

pcuenq/media

Updated about 12 hours ago

pcuenq/tests

Viewer • Updated Jan 6

pcuenq/amused_mps

pcuenq/gists

Updated Nov 13, 2023 • 5

pcuenq/recipe-images

Viewer • Updated Jul 27, 2023

pcuenq/face_synthetics_spiga

Viewer • Updated Mar 20, 2023 • 3 • 10

pcuenq/face_synthetics_spiga_smol

Viewer • Updated Mar 19, 2023

pcuenq/face_synthetics

Viewer • Updated Mar 13, 2023 • 10 • 3

pcuenq/face_synthetics_smol

Viewer • Updated Mar 12, 2023