ojasvisingh786 (Ojasvi Singh Yadav)

upvoted 2 papers 1 day ago

ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models

Paper • 2405.09220 • Published 2 days ago • 15

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

Paper • 2405.09215 • Published 2 days ago • 9

upvoted an article 1 day ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

3 days ago

• 85

upvoted 2 papers 3 days ago

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published 4 days ago • 15

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published 4 days ago • 19

upvoted a paper 5 days ago

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20 • 53

upvoted a paper 11 days ago

What matters when building vision-language models?

Paper • 2405.02246 • Published 14 days ago • 71

upvoted 2 papers 14 days ago

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published 16 days ago • 18

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published 15 days ago • 17

upvoted 2 papers 15 days ago

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Paper • 2405.00263 • Published 16 days ago • 13

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Paper • 2405.00233 • Published 16 days ago • 12

upvoted 4 papers 16 days ago

upvoted 5 papers 17 days ago

Implicit Style-Content Separation using B-LoRA

Paper • 2403.14572 • Published Mar 21 • 2

DressCode: Autoregressively Sewing and Generating Garments from Text Guidance

Paper • 2401.16465 • Published Jan 29 • 7

Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting

Paper • 2404.18911 • Published 18 days ago • 26

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published 20 days ago • 17

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

Paper • 2404.18796 • Published 18 days ago • 62

upvoted 2 papers 18 days ago

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Paper • 2404.17569 • Published 21 days ago • 10

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published 21 days ago • 30

upvoted a paper 21 days ago

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published 22 days ago • 11

upvoted a paper 23 days ago

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published 24 days ago • 17

upvoted 3 papers 24 days ago

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published 26 days ago • 25

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published 25 days ago • 230

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published 27 days ago • 37

upvoted an article 27 days ago

Article

Welcome Llama 3 - Meta's new open LLM

29 days ago

• 238

upvoted a paper 27 days ago

MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published 29 days ago • 23

upvoted 16 papers about 1 month ago

CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting

Paper • 2404.09458 • Published Apr 15 • 6

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 61

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 27

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 42

WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents

Paper • 2404.05902 • Published Apr 8 • 20

HGRN2: Gated Linear RNNs with State Expansion

Paper • 2404.07904 • Published Apr 11 • 16

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11 • 15

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

Paper • 2404.07448 • Published Apr 11 • 10

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11 • 14

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 31

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

Paper • 2404.06780 • Published Apr 10 • 9

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 92

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 22

Revising Densification in Gaussian Splatting

Paper • 2404.06109 • Published Apr 9 • 8

Hash3D: Training-free Acceleration for 3D Generation

Paper • 2404.06091 • Published Apr 9 • 12

upvoted an article about 1 month ago

Article

Hugging Face and AWS partner to make AI more accessible

Feb 21, 2023

• 1

upvoted a paper about 1 month ago

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8 • 23

upvoted an article about 1 month ago

Article

CodeGemma - an official Google release for code LLMs

Apr 9

• 95

upvoted 6 papers about 1 month ago

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

Paper • 2404.04421 • Published Apr 5 • 14

YaART: Yet Another ART Rendering Technology

Paper • 2404.05666 • Published Apr 8 • 14

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

Paper • 2404.05719 • Published Apr 8 • 57

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Paper • 2404.02514 • Published Apr 3 • 9

InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3 • 19

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2 • 53

upvoted 5 papers about 2 months ago

Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1 • 11

MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text

Paper • 2404.00345 • Published Mar 30 • 16

InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion

Paper • 2403.17422 • Published Mar 26 • 1

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

Paper • 2403.17005 • Published Mar 25 • 13

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Paper • 2403.12943 • Published Mar 19 • 13

upvoted a paper 2 months ago

Multistep Consistency Models

Paper • 2403.06807 • Published Mar 11 • 13

Ojasvi Singh Yadav

AI & ML interests

Organizations

ojasvisingh786's activity

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Welcome Llama 3 - Meta's new open LLM

Hugging Face and AWS partner to make AI more accessible

CodeGemma - an official Google release for code LLMs