juand4bot's picture

219 3

juand4bot

juandavidgf

·

juand4bot

AI & ML interests

stable diffusion, transformers, NLP, Business.

Organizations

juandavidgf's activity

upvoted 3 papers 20 days ago

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published 24 days ago • 16

Interactive3D: Create What You Want by Interactive 3D Generation

Paper • 2404.16510 • Published 24 days ago • 17

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published 24 days ago • 51

upvoted 4 papers 25 days ago

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published 27 days ago • 21

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published 27 days ago • 23

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published 26 days ago • 53

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published 27 days ago • 120

upvoted 3 papers 26 days ago

FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17 • 32

How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

Paper • 2404.14047 • Published 27 days ago • 37

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published 27 days ago • 230

upvoted a collection about 2 months ago

HF-curated models available on Workers AI

A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 48

upvoted 8 papers about 2 months ago

Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1 • 10

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15

Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1 • 13

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Paper • 2403.14468 • Published Mar 21 • 18

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

Paper • 2403.13535 • Published Mar 20 • 20

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19 • 24

TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation

Paper • 2403.12906 • Published Mar 19 • 4

LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15 • 14

upvoted 14 papers 2 months ago

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8 • 28

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15 • 61

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13 • 34

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 43

DragAnything: Motion Control for Anything using Entity Representation

Paper • 2403.07420 • Published Mar 12 • 11

V3D: Video Diffusion Models are Effective 3D Generators

Paper • 2403.06738 • Published Mar 11 • 27

VideoMamba: State Space Model for Efficient Video Understanding

Paper • 2403.06977 • Published Mar 11 • 22

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 85

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 50

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Paper • 2402.17245 • Published Feb 27 • 10

Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 14

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 87

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27 • 182

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Paper • 2402.16840 • Published Feb 26 • 23

upvoted 27 papers 3 months ago

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

Paper • 2402.15491 • Published Feb 23 • 13

Seamless Human Motion Composition with Blended Positional Encodings

Paper • 2402.15509 • Published Feb 23 • 12

Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 67

Divide-or-Conquer? Which Part Should You Distill Your LLM?

Paper • 2402.15000 • Published Feb 22 • 22

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 18

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 80

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Paper • 2402.14650 • Published Feb 22 • 6

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 16

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21 • 43

Music Style Transfer with Time-Varying Inversion of Diffusion Models

Paper • 2402.13763 • Published Feb 21 • 9

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 104

Video ReCap: Recursive Captioning of Hour-Long Videos

Paper • 2402.13250 • Published Feb 20 • 19

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Paper • 2402.12712 • Published Feb 20 • 13

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15 • 19

GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting

Paper • 2402.10259 • Published Feb 15 • 13

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Paper • 2402.10211 • Published Feb 15 • 8

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 18

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Paper • 2402.10009 • Published Feb 15 • 18

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 90

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Paper • 2402.09052 • Published Feb 14 • 16

Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14 • 23

Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14 • 24

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

Paper • 2402.07207 • Published Feb 11 • 7

ChemLLM: A Chemical Large Language Model

Paper • 2402.06852 • Published Feb 10 • 17

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12 • 43