Episodes

Latest Episode
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Episode 178 · · 20:24

πŸ€— Upvotes: 33 | cs.CV Authors: Yibin Wang, Zhiyu Tan, Junyan Wang, Xiaomeng Yang, Cheng Jin, Hao Li Title: LiFT: Levera...

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Episode 177 · · 22:00

πŸ€— Upvotes: 31 | cs.CL Authors: LG AI Research, Soyoung An, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Seokhee Hong, Junwon Hw...

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Episode 176 · · 21:58

πŸ€— Upvotes: 30 | cs.CL, cs.CV Authors: Jarvis Guo, Tuney Zheng, Yuelin Bai, Bo Li, Yubo Wang, King Zhu, Yizhi Li, Graham Neubig, Wenhu Chen, Xian...

APOLLO: SGD-like Memory, AdamW-level Performance

APOLLO: SGD-like Memory, AdamW-level Performance

Episode 175 · · 19:44

πŸ€— Upvotes: 27 | cs.LG, cs.AI, cs.PF Authors: Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Z. Pan, Zha...

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Episode 174 · · 20:09

πŸ€— Upvotes: 19 | cs.CV Authors: Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham Title: SwiftEdit: Lig...

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Episode 173 · · 20:18

πŸ€— Upvotes: 18 | cs.RO, cs.AI, cs.CL, cs.CV, cs.LG Authors: Yi Chen, Yuying Ge, Yizhuo Li, Yixiao Ge, Mingyu Ding, Ying Shan, Xihui Liu ...

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Episode 172 · · 22:51

πŸ€— Upvotes: 13 | cs.CV Authors: Kaiyi Huang, Yukun Huang, Xuefei Ning, Zinan Lin, Yu Wang, Xihui Liu Title: GenMAC: Comp...

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

Episode 171 · · 21:05

πŸ€— Upvotes: 12 | cs.CV Authors: Jixuan Fan, Wanhua Li, Yifei Han, Yansong Tang Title: Momentum-GS: Momentum Gaussian Sel...

CompCap: Improving Multimodal Large Language Models with Composite Captions

CompCap: Improving Multimodal Large Language Models with Composite Captions

Episode 170 · · 21:55

πŸ€— Upvotes: 11 | cs.CV, cs.AI, cs.LG Authors: Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab, Aashu Singh, Qifan Wang, David Yang, ShengYun Pen...

VisionZip: Longer is Better but Not Necessary in Vision Language Models

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Episode 169 · · 21:48

πŸ€— Upvotes: 83 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia ...

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Episode 168 · · 19:23

πŸ€— Upvotes: 46 | cs.CV, cs.AI Authors: Jiuhai Chen, Jianwei Yang, Haiping Wu, Dianqi Li, Jianfeng Gao, Tianyi Zhou, Bin Xiao Title: ...

NVILA: Efficient Frontier Visual Language Models

NVILA: Efficient Frontier Visual Language Models

Episode 167 · · 19:31

πŸ€— Upvotes: 36 | cs.CV Authors: Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, ...

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Episode 166 · · 20:40

πŸ€— Upvotes: 32 | cs.CL Authors: Yiheng Xu, Zekun Wang, Junli Wang, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong ...

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Episode 165 · · 22:57

πŸ€— Upvotes: 32 | cs.RO, cs.AI, cs.CV, cs.LG Authors: Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wa...

Evaluating Language Models as Synthetic Data Generators

Evaluating Language Models as Synthetic Data Generators

Episode 164 · · 21:02

πŸ€— Upvotes: 30 | cs.CL Authors: Seungone Kim, Juyoung Suk, Xiang Yue, Vijay Viswanathan, Seongyun Lee, Yizhong Wang, Kiril Gashteovski, Carolin L...

A Noise is Worth Diffusion Guidance

A Noise is Worth Diffusion Guidance

Episode 163 · · 21:20

πŸ€— Upvotes: 25 | cs.CV, cs.AI, cs.LG Authors: Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak ...

Structured 3D Latents for Scalable and Versatile 3D Generation

Structured 3D Latents for Scalable and Versatile 3D Generation

Episode 162 · · 23:37

πŸ€— Upvotes: 22 | cs.CV Authors: Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang ...

Negative Token Merging: Image-based Adversarial Feature Guidance

Negative Token Merging: Image-based Adversarial Feature Guidance

Episode 161 · · 19:33

πŸ€— Upvotes: 21 | cs.CV, cs.AI, cs.GR, cs.LG, stat.ML Authors: Jaskirat Singh, Lindsey Li, Weijia Shi, Ranjay Krishna, Yejin Choi, Pang Wei Koh, M...

MV-Adapter: Multi-view Consistent Image Generation Made Easy

MV-Adapter: Multi-view Consistent Image Generation Made Easy

Episode 160 · · 21:23

πŸ€— Upvotes: 17 | cs.CV Authors: Zehuan Huang, Yuan-Chen Guo, Haoran Wang, Ran Yi, Lizhuang Ma, Yan-Pei Cao, Lu Sheng Title: ...

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Episode 159 · · 24:33

πŸ€— Paper Upvotes: 48 | cs.CV, cs.AI, cs.CL, cs.HC Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixi...

Star Attention: Efficient LLM Inference over Long Sequences

Star Attention: Efficient LLM Inference over Long Sequences

Episode 158 · · 20:34

πŸ€— Paper Upvotes: 32 | cs.CL, cs.AI, cs.LG Authors: Shantanu Acharya, Fei Jia, Boris Ginsburg Title: Star Attention: Eff...

Pathways on the Image Manifold: Image Editing via Video Generation

Pathways on the Image Manifold: Image Editing via Video Generation

Episode 157 · · 25:04

πŸ€— Paper Upvotes: 23 | cs.CV, cs.AI, cs.LG Authors: Noam Rotstein, Gal Yona, Daniel Silver, Roy Velich, David BensaΓ―d, Ron Kimmel Ti...

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Episode 156 · · 26:29

πŸ€— Paper Upvotes: 15 | cs.CV, cs.AI, cs.CL Authors: Chaoyou Fu, Yi-Fan Zhang, Shukang Yin, Bo Li, Xinyu Fang, Sirui Zhao, Haodong Duan, Xing Sun,...

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Episode 155 · · 22:13

πŸ€— Paper Upvotes: 14 | cs.CV Authors: Yuhang Han, Xuyang Liu, Pengxiang Ding, Donglin Wang, Honggang Chen, Qingsen Yan, Siteng Huang ...

SketchAgent: Language-Driven Sequential Sketch Generation

SketchAgent: Language-Driven Sequential Sketch Generation

Episode 154 · · 24:41

πŸ€— Paper Upvotes: 13 | cs.CV Authors: Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba Titl...

TEXGen: a Generative Diffusion Model for Mesh Textures

TEXGen: a Generative Diffusion Model for Mesh Textures

Episode 153 · · 24:25

πŸ€— Paper Upvotes: 12 | cs.CV, cs.AI, cs.GR Authors: Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, JianHui Liu, Yangguang Li, Yan-Pei Cao, Ding L...

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Episode 152 · · 22:01

πŸ€— Paper Upvotes: 8 | cs.CV, cs.CL Authors: Lei Li, Yuancheng Wei, Zhihui Xie, Xuqing Yang, Yifan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujia...

Learning 3D Representations from Procedural 3D Programs

Learning 3D Representations from Procedural 3D Programs

Episode 151 · · 24:23

πŸ€— Paper Upvotes: 8 | cs.CV Authors: Xuweiyi Chen, Zezhou Cheng Title: Learning 3D Representations from Procedural 3D Pr...

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Episode 150 · · 26:01

πŸ€— Paper Upvotes: 7 | cs.CV Authors: Yongwei Chen, Yushi Lan, Shangchen Zhou, Tengfei Wang, XIngang Pan Title: SAR3D: Au...

Material Anything: Generating Materials for Any 3D Object via Diffusion

Material Anything: Generating Materials for Any 3D Object via Diffusion

Episode 149 · · 21:52

πŸ€— Paper Upvotes: 33 | cs.CV, cs.GR Authors: Xin Huang, Tengfei Wang, Ziwei Liu, Qing Wang Title: Material Anything: Gen...