samusenps's picture

samusenps

samusenps

·

AI & ML interests

Foundational Architectures, Multi-Modality, Interpretability, Benchmarking w/ simulations, Robotics, Integration with Non envasive Open Source stack RISC-V BCI. Extremely high quality training data. Fully Open Source ML/AI.

Organizations

samusenps's activity

upvoted 12 papers 4 days ago

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published 4 days ago • 16

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Paper • 2405.10315 • Published 4 days ago • 7

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published 4 days ago • 13

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Paper • 2405.09874 • Published 4 days ago • 10

Many-Shot In-Context Learning in Multimodal Foundation Models

Paper • 2405.09798 • Published 4 days ago • 22

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published 4 days ago • 29

LoRA Learns Less and Forgets Less

Paper • 2405.09673 • Published 5 days ago • 54

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published 4 days ago • 67

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Paper • 2405.09062 • Published 5 days ago • 4

BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

Paper • 2405.09546 • Published 5 days ago • 6

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 5 days ago • 12

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 5 days ago • 20

upvoted 8 papers 5 days ago

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Paper • 2405.08317 • Published 6 days ago • 7

SpeechVerse: A Large-scale Generalizable Audio Language Model

Paper • 2405.08295 • Published 6 days ago • 8

Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published 6 days ago • 10

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published 6 days ago • 9

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published 6 days ago • 21

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Paper • 2405.08054 • Published 7 days ago • 17

Compositional Text-to-Image Generation with Dense Blob Representations

Paper • 2405.08246 • Published 7 days ago • 11

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published 6 days ago • 15

upvoted 9 papers 6 days ago

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published 7 days ago • 13

Large Language Models as Planning Domain Generators

Paper • 2405.06650 • Published Apr 2 • 8

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published 9 days ago • 15

LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published 9 days ago • 14

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 7 days ago • 15

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published 7 days ago • 19

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published 13 days ago • 33

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published 7 days ago • 52

What matters when building vision-language models?

Paper • 2405.02246 • Published 17 days ago • 77

upvoted 3 papers 7 days ago

WANDR: Intention-guided Human Motion Generation

Paper • 2404.15383 • Published 27 days ago • 1

EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars

Paper • 2404.19110 • Published 21 days ago • 3

3D Gaussian Blendshapes for Head Avatar Animation

Paper • 2404.19398 • Published 20 days ago • 2

upvoted 11 papers 8 days ago

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Paper • 2308.05725 • Published Aug 10, 2023 • 1

On Bringing Robots Home

Paper • 2311.16098 • Published Nov 27, 2023 • 2

GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer

Paper • 2311.08526 • Published Nov 14, 2023 • 7

MOMENT: A Family of Open Time-series Foundation Models

Paper • 2402.03885 • Published Feb 6 • 6

NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data

Paper • 2402.15343 • Published Feb 23 • 8

AST: Audio Spectrogram Transformer

Paper • 2104.01778 • Published Apr 5, 2021 • 2

Salient Object-Aware Background Generation using Text-Guided Diffusion Models

Paper • 2404.10157 • Published Apr 15 • 1

An Optimistic Acceleration of AMSGrad for Nonconvex Optimization

Paper • 1903.01435 • Published Mar 4, 2019 • 1

Improving Generalization Performance by Switching from Adam to SGD

Paper • 1712.07628 • Published Dec 20, 2017 • 1

AlignBench: Benchmarking Chinese Alignment of Large Language Models

Paper • 2311.18743 • Published Nov 30, 2023 • 1

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

Paper • 2405.03121 • Published 15 days ago • 1

upvoted 4 papers 9 days ago

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

Paper • 2405.02793 • Published 16 days ago • 1

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published 12 days ago • 5

FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space

Paper • 2405.01828 • Published 17 days ago • 1

A decoder-only foundation model for time-series forecasting

Paper • 2310.10688 • Published Oct 14, 2023 • 3

upvoted a paper 14 days ago

Self-healing Nodes with Adaptive Data-Sharding

Paper • 2405.00004 • Published Jan 19 • 4

upvoted 8 papers 17 days ago

LLM-AD: Large Language Model based Audio Description System

Paper • 2405.00983 • Published 18 days ago • 13

FLAME: Factuality-Aware Alignment for Large Language Models

Paper • 2405.01525 • Published 18 days ago • 21

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published 18 days ago • 17

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Paper • 2405.01481 • Published 18 days ago • 20

WildChat: 1M ChatGPT Interaction Logs in the Wild

Paper • 2405.01470 • Published 18 days ago • 52

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published 18 days ago • 44

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Paper • 2405.00732 • Published 21 days ago • 109

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published 18 days ago • 92

upvoted 4 papers 18 days ago

Automatic Creative Selection with Cross-Modal Matching

Paper • 2405.00029 • Published Feb 28 • 7

STT: Stateful Tracking with Transformers for Autonomous Driving

Paper • 2405.00236 • Published 20 days ago • 7

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published 19 days ago • 18

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published 20 days ago • 12