Episodes

Latest Episode
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Episode 358 · · 24:00

🤗 Upvotes: 10 | cs.CV, cs.GR Authors: Kam Woh Ng, Jing Yang, Jia Wei Sii, Jiankang Deng, Chee Seng Chan, Yi-Zhe Song, Tao Xiang, Xiatian Zhu ...

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

Episode 357 · · 22:36

🤗 Upvotes: 5 | cs.LG, cs.AI, cs.CL, 68T45 Authors: Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet Singh, Basab Ghosh, Yas...

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Episode 356 · · 21:46

🤗 Upvotes: 51 | cs.CL, cs.LG Authors: Jian Hu Title: REINFORCE++: A Simple and Efficient Approach for Aligning Large La...

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Episode 355 · · 22:34

🤗 Upvotes: 32 | cs.CV Authors: Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang ...

Cosmos World Foundation Model Platform for Physical AI

Cosmos World Foundation Model Platform for Physical AI

Episode 354 · · 25:38

🤗 Upvotes: 31 | cs.CV, cs.AI, cs.LG, cs.RO Authors: NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, ...

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Episode 353 · · 21:51

🤗 Upvotes: 22 | cs.CV, cs.AI, cs.CL Authors: Shaolei Zhang, Qingkai Fang, Zhe Yang, Yang Feng Title: LLaVA-Mini: Effici...

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Episode 352 · · 22:41

🤗 Upvotes: 18 | cs.CV Authors: Haobo Yuan, Xiangtai Li, Tao Zhang, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, Ming-H...

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Episode 351 · · 23:15

🤗 Upvotes: 13 | cs.CV, cs.AI, cs.GR Authors: Zekai Gu, Rui Yan, Jiahao Lu, Peng Li, Zhiyang Dou, Chenyang Si, Zhen Dong, Qifeng Liu, Cheng Lin, ...

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Episode 350 · · 20:34

🤗 Upvotes: 10 | cs.CL, cs.CV Authors: Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, ...

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

Episode 349 · · 22:09

🤗 Upvotes: 10 | cs.AI, cs.CL Authors: Hao Zheng, Xinyan Guan, Hao Kong, Jia Zheng, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, Le Sun ...

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model

Episode 348 · · 22:34

🤗 Upvotes: 6 | cs.CL, cs.AI Authors: Yueqin Yin, Shentao Yang, Yujia Xie, Ziyi Yang, Yuting Sun, Hany Awadalla, Weizhu Chen, Mingyuan Zhou ...

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting

Episode 347 · · 20:49

🤗 Upvotes: 6 | cs.CV Authors: Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim Title: ...

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Episode 346 · · 22:18

🤗 Upvotes: 38 | cs.CV Authors: Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying ...

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Episode 345 · · 26:54

🤗 Upvotes: 23 | cs.CV Authors: Rui Qian, Shuangrui Ding, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang Tit...

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Episode 344 · · 22:26

🤗 Upvotes: 22 | cs.CL, cs.AI, cs.LG Authors: Beichen Zhang, Yuhong Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Haodong Duan, Yuhang Cao, Dahua Lin...

Personalized Graph-Based Retrieval for Large Language Models

Personalized Graph-Based Retrieval for Large Language Models

Episode 343 · · 21:16

🤗 Upvotes: 19 | cs.CL Authors: Steven Au, Cameron J. Dimacali, Ojasmitha Pedirappagari, Namyong Park, Franck Dernoncourt, Yu Wang, Nikos Kanakar...

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring

Episode 342 · · 21:38

🤗 Upvotes: 13 | q-bio.GN, cs.AI, cs.CL, cs.LG Authors: Ollie Liu, Sami Jaghouar, Johannes Hagemann, Shangshang Wang, Jason Wiemels, Jeff Kaufman...

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

Episode 341 · · 22:25

🤗 Upvotes: 12 | cs.CV Authors: Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng Li Title: GS-...

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Episode 340 · · 22:15

🤗 Upvotes: 12 | cs.CV, cs.AI, cs.LG Authors: Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam...

TransPixar: Advancing Text-to-Video Generation with Transparency

TransPixar: Advancing Text-to-Video Generation with Transparency

Episode 339 · · 22:45

🤗 Upvotes: 9 | cs.CV Authors: Luozhou Wang, Yijun Li, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang, He Zhang, Zhe Lin, Yingcong Chen T...

AutoPresent: Designing Structured Visuals from Scratch

AutoPresent: Designing Structured Visuals from Scratch

Episode 338 · · 19:20

🤗 Upvotes: 7 | cs.CV, cs.CL Authors: Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Su...

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Episode 337 · · 24:44

🤗 Upvotes: 41 | cs.RO, cs.CV, cs.LG Authors: Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Peng Gao, Hongshe...

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Episode 336 · · 20:37

🤗 Upvotes: 23 | cs.CV, cs.SD, eess.AS Authors: Chaoyou Fu, Haojia Lin, Xiong Wang, Yi-Fan Zhang, Yunhang Shen, Xiaoyu Liu, Yangze Li, Zuwei Long...

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Episode 335 · · 23:02

🤗 Upvotes: 12 | cs.CV Authors: Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shuru...

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Episode 334 · · 22:38

🤗 Upvotes: 12 | cs.CV, cs.AI Authors: Yifan Du, Zikang Liu, Yifan Li, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, Weipeng Chen, Zheng Liu, Zhongyua...

SDPO: Segment-Level Direct Preference Optimization for Social Agents

SDPO: Segment-Level Direct Preference Optimization for Social Agents

Episode 333 · · 19:44

🤗 Upvotes: 10 | cs.AI, cs.CL Authors: Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu, Ke Wang, Xiaoqian Liu, Qicheng Li, Yong Qin, Fe...

Graph Generative Pre-trained Transformer

Graph Generative Pre-trained Transformer

Episode 332 · · 20:24

🤗 Upvotes: 9 | cs.LG, cs.AI Authors: Xiaohui Chen, Yinkai Wang, Jiaxing He, Yuanqi Du, Soha Hassoun, Xiaolin Xu, Li-Ping Liu Title:...

LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models

LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models

Episode 331 · · 23:14

🤗 Upvotes: 7 | cs.CL, cs.IR Authors: Hieu Man, Nghia Trung Ngo, Viet Dac Lai, Ryan A. Rossi, Franck Dernoncourt, Thien Huu Nguyen T...

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

Episode 330 · · 25:56

🤗 Upvotes: 5 | cs.LG, cs.AI Authors: Kanishk Gandhi, Michael Y. Li, Lyle Goodyear, Louise Li, Aditi Bhaskar, Mohammed Zaman, Noah D. Goodman ...

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Episode 329 · · 23:53

🤗 Upvotes: 45 | cs.CV, cs.CL, cs.LG Authors: Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang...