Episodes

Latest Episode
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers

Episode 238 · · 22:52

πŸ€— Upvotes: 9 | cs.CL, cs.AI, cs.LG Authors: Seungwook Han, Jinyeop Song, Jeff Gore, Pulkit Agrawal Title: Emergence of ...

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Episode 237 · · 20:44

πŸ€— Upvotes: 7 | cs.CV Authors: Mark Endo, Xiaohan Wang, Serena Yeung-Levy Title: Feather the Throttle: Revisiting Visual...

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Episode 236 · · 23:53

πŸ€— Upvotes: 5 | cs.LG, cs.AI, cs.CV Authors: Yifei Zhou, Qianlan Yang, Kaixiang Lin, Min Bai, Xiong Zhou, Yu-Xiong Wang, Sergey Levine, Erran Li ...

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Episode 235 · · 23:12

πŸ€— Upvotes: 4 | cs.CL Authors: Manan Suri, Puneet Mathur, Franck Dernoncourt, Kanika Goswami, Ryan A. Rossi, Dinesh Manocha Title: ...

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Episode 234 · · 20:27

πŸ€— Upvotes: 2 | cs.CV Authors: Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun Title: SUGAR: Subj...

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion

Episode 233 · · 20:33

πŸ€— Upvotes: 2 | cs.CV, cs.LG Authors: Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, Anton Obukhov ...

Byte Latent Transformer: Patches Scale Better Than Tokens

Byte Latent Transformer: Patches Scale Better Than Tokens

Episode 232 · · 25:08

πŸ€— Upvotes: 39 | cs.CL Authors: Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Y...

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Episode 231 · · 21:46

πŸ€— Upvotes: 25 | cs.CL, cs.AI, cs.IR Authors: Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou Title...

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

Episode 230 · · 21:10

πŸ€— Upvotes: 25 | cs.CV, cs.AI, cs.CL Authors: Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu Title: Evaluation A...

BrushEdit: All-In-One Image Inpainting and Editing

BrushEdit: All-In-One Image Inpainting and Editing

Episode 229 · · 27:48

πŸ€— Upvotes: 24 | cs.CV, cs.AI Authors: Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Ying Shan, Yuexian Zou, Qiang Xu Title: ...

ColorFlow: Retrieval-Augmented Image Sequence Colorization

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Episode 228 · · 22:32

πŸ€— Upvotes: 20 | cs.CV Authors: Junhao Zhuang, Xuan Ju, Zhaoyang Zhang, Yong Liu, Shiyi Zhang, Chun Yuan, Ying Shan Title: ...

Smaller Language Models Are Better Instruction Evolvers

Smaller Language Models Are Better Instruction Evolvers

Episode 227 · · 23:17

πŸ€— Upvotes: 16 | cs.CL Authors: Tingfeng Hui, Lulu Zhao, Guanting Dong, Yaqi Zhang, Hua Zhou, Sen Su Title: Smaller Lang...

Causal Diffusion Transformers for Generative Modeling

Causal Diffusion Transformers for Generative Modeling

Episode 226 · · 23:47

πŸ€— Upvotes: 16 | cs.CV Authors: Chaorui Deng, Deyao Zhu, Kunchang Li, Shi Guang, Haoqi Fan Title: Causal Diffusion Trans...

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Episode 225 · · 23:05

πŸ€— Upvotes: 11 | cs.CL, cs.AI, cs.LG Authors: Jiale Cheng, Xiao Liu, Cunxiang Wang, Xiaotao Gu, Yida Lu, Dan Zhang, Yuxiao Dong, Jie Tang, Hongni...

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

Episode 224 · · 20:29

πŸ€— Upvotes: 11 | cs.CV Authors: Zhibing Li, Tong Wu, Jing Tan, Mengchen Zhang, Jiaqi Wang, Dahua Lin Title: IDArb: Intri...

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Episode 223 · · 21:15

πŸ€— Upvotes: 10 | cs.RO, cs.AI, cs.CV Authors: Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao, S...

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Episode 222 · · 24:59

πŸ€— Upvotes: 91 | cs.CV, cs.AI Authors: Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofa...

GenEx: Generating an Explorable World

GenEx: Generating an Explorable World

Episode 221 · · 21:28

πŸ€— Upvotes: 65 | cs.CV, cs.RO Authors: Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama ...

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Episode 220 · · 25:15

πŸ€— Upvotes: 29 | cs.CV Authors: Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiaogang Wang, Hongsheng Li, L...

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

Episode 219 · · 17:46

πŸ€— Upvotes: 24 | cs.CV Authors: Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Sara Pieri, Saeed Yahya Alseiari, Shanavas Cholakkal, Khaled Ald...

Large Action Models: From Inception to Implementation

Large Action Models: From Inception to Implementation

Episode 218 · · 22:15

πŸ€— Upvotes: 23 | cs.AI Authors: Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qis...

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

Episode 217 · · 21:01

πŸ€— Upvotes: 17 | cs.CV, cs.AI Authors: Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai ...

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Episode 216 · · 21:32

πŸ€— Upvotes: 13 | cs.CV Authors: Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu ...

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

Episode 215 · · 21:41

πŸ€— Upvotes: 10 | cs.CV Authors: Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, Yedid Hoshen Title: ...

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Episode 214 · · 21:47

πŸ€— Upvotes: 8 | cs.CV Authors: Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, Fan Tang Title: FireFlow: Fast In...

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Episode 213 · · 18:48

πŸ€— Upvotes: 7 | cs.CV Authors: Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag Title: FluxSpace: Disentangled Semantic Edi...

Phi-4 Technical Report

Phi-4 Technical Report

Episode 212 · · 22:12

πŸ€— Upvotes: 40 | cs.CL, cs.AI Authors: Marah Abdin, Jyoti Aneja, Harkirat Behl, SΓ©bastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison...

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Episode 211 · · 24:28

πŸ€— Upvotes: 30 | cs.CV, cs.AI, cs.CL Authors: Jiarui Zhang, Ollie Liu, Tianyu Yu, Jinyi Hu, Willie Neiswanger Title: Euc...

Multimodal Latent Language Modeling with Next-Token Diffusion

Multimodal Latent Language Modeling with Next-Token Diffusion

Episode 210 · · 22:35

πŸ€— Upvotes: 21 | cs.CL, cs.CV, cs.LG Authors: Yutao Sun, Hangbo Bao, Wenhui Wang, Zhiliang Peng, Li Dong, Shaohan Huang, Jianyong Wang, Furu Wei ...

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Episode 209 · · 21:54

πŸ€— Upvotes: 17 | cs.CV Authors: Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li T...