Episodes

Latest Episode
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

Episode 171 · · 21:05

πŸ€— Upvotes: 12 | cs.CV Authors: Jixuan Fan, Wanhua Li, Yifei Han, Yansong Tang Title: Momentum-GS: Momentum Gaussian Sel...

CompCap: Improving Multimodal Large Language Models with Composite Captions

CompCap: Improving Multimodal Large Language Models with Composite Captions

Episode 170 · · 21:55

πŸ€— Upvotes: 11 | cs.CV, cs.AI, cs.LG Authors: Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab, Aashu Singh, Qifan Wang, David Yang, ShengYun Pen...

VisionZip: Longer is Better but Not Necessary in Vision Language Models

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Episode 169 · · 21:48

πŸ€— Upvotes: 83 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia ...

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Episode 168 · · 19:23

πŸ€— Upvotes: 46 | cs.CV, cs.AI Authors: Jiuhai Chen, Jianwei Yang, Haiping Wu, Dianqi Li, Jianfeng Gao, Tianyi Zhou, Bin Xiao Title: ...

NVILA: Efficient Frontier Visual Language Models

NVILA: Efficient Frontier Visual Language Models

Episode 167 · · 19:31

πŸ€— Upvotes: 36 | cs.CV Authors: Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, ...

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Episode 166 · · 20:40

πŸ€— Upvotes: 32 | cs.CL Authors: Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong ...

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Episode 165 · · 22:57

πŸ€— Upvotes: 32 | cs.RO, cs.AI, cs.CV, cs.LG Authors: Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wa...

Evaluating Language Models as Synthetic Data Generators

Evaluating Language Models as Synthetic Data Generators

Episode 164 · · 21:02

πŸ€— Upvotes: 30 | cs.CL Authors: Seungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin L...

A Noise is Worth Diffusion Guidance

A Noise is Worth Diffusion Guidance

Episode 163 · · 21:20

πŸ€— Upvotes: 25 | cs.CV, cs.AI, cs.LG Authors: Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak ...

Structured 3D Latents for Scalable and Versatile 3D Generation

Structured 3D Latents for Scalable and Versatile 3D Generation

Episode 162 · · 23:37

πŸ€— Upvotes: 22 | cs.CV Authors: Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang ...

Negative Token Merging: Image-based Adversarial Feature Guidance

Negative Token Merging: Image-based Adversarial Feature Guidance

Episode 161 · · 19:33

πŸ€— Upvotes: 21 | cs.CV, cs.AI, cs.GR, cs.LG, stat.ML Authors: Jaskirat Singh, Lindsey Li, Weijia Shi, Ranjay Krishna, Yejin Choi, Pang Wei Koh, M...

MV-Adapter: Multi-view Consistent Image Generation Made Easy

MV-Adapter: Multi-view Consistent Image Generation Made Easy

Episode 160 · · 21:23

πŸ€— Upvotes: 17 | cs.CV Authors: Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, Lu Sheng Title: ...

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Episode 159 · · 24:33

πŸ€— Paper Upvotes: 48 | cs.CV, cs.AI, cs.CL, cs.HC Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixi...

Star Attention: Efficient LLM Inference over Long Sequences

Star Attention: Efficient LLM Inference over Long Sequences

Episode 158 · · 20:34

πŸ€— Paper Upvotes: 32 | cs.CL, cs.AI, cs.LG Authors: Shantanu Acharya, Fei Jia, Boris Ginsburg Title: Star Attention: Eff...

Pathways on the Image Manifold: Image Editing via Video Generation

Pathways on the Image Manifold: Image Editing via Video Generation

Episode 157 · · 25:04

πŸ€— Paper Upvotes: 23 | cs.CV, cs.AI, cs.LG Authors: Noam Rotstein, Gal Yona, Daniel Silver, Roy Velich, David BensaΓ―d, Ron Kimmel Ti...

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Episode 156 · · 26:29

πŸ€— Paper Upvotes: 15 | cs.CV, cs.AI, cs.CL Authors: Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun,...

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Episode 155 · · 22:13

πŸ€— Paper Upvotes: 14 | cs.CV Authors: Yuhang Han, Xuyang Liu, Pengxiang Ding, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang ...

SketchAgent: Language-Driven Sequential Sketch Generation

SketchAgent: Language-Driven Sequential Sketch Generation

Episode 154 · · 24:41

πŸ€— Paper Upvotes: 13 | cs.CV Authors: Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba Titl...

TEXGen: a Generative Diffusion Model for Mesh Textures

TEXGen: a Generative Diffusion Model for Mesh Textures

Episode 153 · · 24:25

πŸ€— Paper Upvotes: 12 | cs.CV, cs.AI, cs.GR Authors: Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, JianHui Liu, Yangguang Li, Yan-Pei Cao, Ding L...

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Episode 152 · · 22:01

πŸ€— Paper Upvotes: 8 | cs.CV, cs.CL Authors: Lei Li, Yuancheng Wei, Zhihui Xie, Xuqing Yang, Yifan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujia...

Learning 3D Representations from Procedural 3D Programs

Learning 3D Representations from Procedural 3D Programs

Episode 151 · · 24:23

πŸ€— Paper Upvotes: 8 | cs.CV Authors: Xuweiyi Chen, Zezhou Cheng Title: Learning 3D Representations from Procedural 3D Pr...

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Episode 150 · · 26:01

πŸ€— Paper Upvotes: 7 | cs.CV Authors: Yongwei Chen, Yushi Lan, Shangchen Zhou, Tengfei Wang, XIngang Pan Title: SAR3D: Au...

Material Anything: Generating Materials for Any 3D Object via Diffusion

Material Anything: Generating Materials for Any 3D Object via Diffusion

Episode 149 · · 21:52

πŸ€— Paper Upvotes: 33 | cs.CV, cs.GR Authors: Xin Huang, Tengfei Wang, Ziwei Liu, Qing Wang Title: Material Anything: Gen...

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Episode 148 · · 27:22

πŸ€— Paper Upvotes: 28 | cs.CV Authors: Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon Title: Large-Scale Text-to...

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Episode 147 · · 21:57

πŸ€— Paper Upvotes: 19 | cs.AI, cs.CL Authors: Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattach...

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Episode 146 · · 21:01

πŸ€— Paper Upvotes: 18 | cs.CL, cs.AI Authors: Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, W...

MH-MoE: Multi-Head Mixture-of-Experts

MH-MoE: Multi-Head Mixture-of-Experts

Episode 145 · · 21:00

πŸ€— Paper Upvotes: 17 | cs.CL Authors: Shaohan Huang, Xun Wu, Shuming Ma, Furu Wei Title: MH-MoE: Multi-Head Mixture-of-E...

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Episode 144 · · 21:09

πŸ€— Paper Upvotes: 15 | cs.CV Authors: Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu...

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Episode 143 · · 22:42

πŸ€— Paper Upvotes: 13 | cs.CV, cs.AI, cs.CL Authors: Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal Title: Dream...

Knowledge Transfer Across Modalities with Natural Language Supervision

Knowledge Transfer Across Modalities with Natural Language Supervision

Episode 142 · · 20:38

πŸ€— Paper Upvotes: 13 | cs.CV, 68T45 (Primary) 68T50 (Secondary), I.2.6 Authors: Carlo Alberto Barbano, Luca Molinaro, Emanuele Aiello, Marco Gran...