Sayak Paul

sayakpaul

AI & ML interests

Diffusion models, representation learning

Articles

Organizations

sayakpaul's activity

posted an update 15 days ago
posted an update about 1 month ago
view post
Post
2297
Worked on a short blog post discussing how we semi-automated the release process of the diffusers library. The post delves deeper into the workflows responsible for:

* Publishing the package on Test PyPI and main PyPI servers.
* Notifying an internal Slack channel after a release is published on the repository.

Check it out here πŸ‘‰
https://sayak.dev/posts/streamlined-releases.html
posted an update about 1 month ago
view post
Post
1854
How about engaging in a creative chat with your favorite video character? πŸ’¬

@chansung and I worked on a weekend project combining the benefits of Gemini 1.0 and powerful chat models like Zephyr to demo this.

We use Gemini 1.0 to produce the personality traits of any character found in an input video. We then prepare a system prompt with the discovered traits to start chatting with an LLM (Zephyr in this case).

Managing a video captioning model is a little out of our expertise, hence Gemini FTW here πŸ˜Άβ€πŸŒ«οΈ

πŸ‘¨β€πŸ’» Code: https://github.com/deep-diver/Vid2Persona
πŸ€— Demo: chansung/vid2persona
posted an update about 2 months ago
view post
Post
We released 🧨 Diffusers 0.27.0, and it's a versatile release πŸ’«

Among other things, we shipped:

* Stable Cascade
* Playground v2.5 and EDM-style training
* EDM-formulated schedulers
* Trajectory Consistency Distillation for accelerated sampling
* A new guide on merging LoRAs
* A new image editing pipeline -- LEDITS++

Check out the release notes to catch everything that went into the release
https://github.com/huggingface/diffusers/releases/tag/v0.27.0

Thanks to everyone that contributed to the release πŸ€—
replied to chansung's post 4 months ago
view reply

I mean we should be able to make the most out of the GPU by reducing the idle-time as much as possible while also ensuring the throughput is really the highest we can get out of the card.

For example, if we are getting 60 QPS, is that the highest we can get out of the card? Is it the maximum limit?

replied to chansung's post 4 months ago
view reply

I think we can consider using the cheapest yet reasonable alternative. Okay to probably not exhaustively consider all the specs. For example, it won't make much sense to do this using a 4GB card to do SDXL deployment. So, something in the range of 16-24GB should suffice.

replied to victor's post 4 months ago
view reply

How would you aim for the cheapest latency using existing tooling?

replied to chansung's post 4 months ago
view reply

Slick Let's do a project on diffusion models using the cheapest option possible. But we can also show if it can provide the highest efficiency. What say?

replied to osanseviero's post 4 months ago
view reply

So, we replace the FFN layer with FFN layers from different models (which hence requires models to be of the same size).

Crazy that this works!

Haven't gone through the details but a follow-up question.

If the models are needed to be of the same size, how do we select the FFN layers from another model to replace a single FFN layer from the other? If a Transformer block contains a single FFN block (composition of dense layers), how do we accumulate multiple FFN layers, though.

replied to osanseviero's post 4 months ago
view reply

How are the params of the MoE layers populated, though? It doesn't impact the performance? What's the intuition? 😟

replied to toshas's post 4 months ago