256 11 105

Sayak Paul

sayakpaul

https://sayak.dev

RisingSayak

sayakpaul

AI & ML interests

Diffusion models, representation learning

Articles

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Feb 10, 2023

• 9

A Dive into Pretraining Strategies for Vision-Language Models

Feb 3, 2023

• 6

The State of Computer Vision at Hugging Face 🤗

Jan 30, 2023

• 3

Using LoRA for Efficient Stable Diffusion Fine-Tuning

Jan 26, 2023

• 3

Image Similarity with Hugging Face Datasets and Transformers

Jan 16, 2023

• 1

Deploying 🤗 ViT on Kubernetes with TF Serving

Aug 11, 2022

Deploying TensorFlow Vision Models in Hugging Face with TF Serving

Jul 25, 2022

Organizations

sayakpaul's activity

posted an update 15 days ago

Post

2500

We're introducing experimental support for device_map in Diffusers 🤗

If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.

Documentation:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#device-placement

🚨 Currently, only "balanced" device mapping strategy is supported.

posted an update about 1 month ago

Post

2297

Worked on a short blog post discussing how we semi-automated the release process of the diffusers library. The post delves deeper into the workflows responsible for:

* Publishing the package on Test PyPI and main PyPI servers.
* Notifying an internal Slack channel after a release is published on the repository.

Check it out here 👉
https://sayak.dev/posts/streamlined-releases.html

posted an update about 1 month ago

Post

1854

How about engaging in a creative chat with your favorite video character? 💬

@chansung and I worked on a weekend project combining the benefits of Gemini 1.0 and powerful chat models like Zephyr to demo this.

We use Gemini 1.0 to produce the personality traits of any character found in an input video. We then prepare a system prompt with the discovered traits to start chatting with an LLM (Zephyr in this case).

Managing a video captioning model is a little out of our expertise, hence Gemini FTW here 😶‍🌫️

👨‍💻 Code: https://github.com/deep-diver/Vid2Persona
🤗 Demo: chansung/vid2persona

posted an update about 2 months ago

Post

We released 🧨 Diffusers 0.27.0, and it's a versatile release 💫

Among other things, we shipped:

* Stable Cascade
* Playground v2.5 and EDM-style training
* EDM-formulated schedulers
* Trajectory Consistency Distillation for accelerated sampling
* A new guide on merging LoRAs
* A new image editing pipeline -- LEDITS++

Check out the release notes to catch everything that went into the release
https://github.com/huggingface/diffusers/releases/tag/v0.27.0

Thanks to everyone that contributed to the release 🤗

replied to chansung's post 4 months ago

I mean we should be able to make the most out of the GPU by reducing the idle-time as much as possible while also ensuring the throughput is really the highest we can get out of the card.

For example, if we are getting 60 QPS, is that the highest we can get out of the card? Is it the maximum limit?

replied to chansung's post 4 months ago

I think we can consider using the cheapest yet reasonable alternative. Okay to probably not exhaustively consider all the specs. For example, it won't make much sense to do this using a 4GB card to do SDXL deployment. So, something in the range of 16-24GB should suffice.

replied to victor's post 4 months ago

How would you aim for the cheapest latency using existing tooling?

replied to chansung's post 4 months ago

Slick Let's do a project on diffusion models using the cheapest option possible. But we can also show if it can provide the highest efficiency. What say?

replied to osanseviero's post 4 months ago

So, we replace the FFN layer with FFN layers from different models (which hence requires models to be of the same size).

Crazy that this works!

Haven't gone through the details but a follow-up question.

If the models are needed to be of the same size, how do we select the FFN layers from another model to replace a single FFN layer from the other? If a Transformer block contains a single FFN block (composition of dense layers), how do we accumulate multiple FFN layers, though.

replied to osanseviero's post 4 months ago

How are the params of the MoE layers populated, though? It doesn't impact the performance? What's the intuition? 😟

replied to toshas's post 4 months ago

Super cool work.

Anyone curious, you can try out Marigold in diffusers through a custom pipeline too. Check it out here: https://github.com/huggingface/diffusers/tree/main/examples/community#marigold-depth-estimation.

Sayak Paul

AI & ML interests

Articles

🤗 PEFT welcomes new merging methods

Welcome aMUSEd: Efficient Text-to-Image Generation

SDXL in 4 steps with Latent Consistency LoRAs

Personal Copilot: Train Your Own Coding Assistant

Exploring simple optimizations for SDXL

Finetune Stable Diffusion Models with DDPO via TRL

Introducing Würstchen: Fast Diffusion for Image Generation

Efficient Controllable Generation for SDXL with T2I-Adapters

Happy 1st anniversary 🤗 Diffusers!

Optimizing Stable Diffusion for Intel CPUs with NNCF and 🤗 Optimum

Instruction-tuning Stable Diffusion with InstructPix2Pix

Training a language model with 🤗 Transformers using TensorFlow and TPUs

ControlNet in Diffusers 🧨

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

A Dive into Pretraining Strategies for Vision-Language Models

The State of Computer Vision at Hugging Face 🤗

Using LoRA for Efficient Stable Diffusion Fine-Tuning

Image Similarity with Hugging Face Datasets and Transformers

Deploying 🤗 ViT on Vertex AI

Deploying 🤗 ViT on Kubernetes with TF Serving

Deploying TensorFlow Vision Models in Hugging Face with TF Serving

Organizations

sayakpaul's activity