Episodes

Latest Episode
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Episode 313 · · 25:04

πŸ€— Upvotes: 39 | cs.CV Authors: Yang Shen, Xiu-Shen Wei, Yifan Sun, Yuxin Song, Tao Yuan, Jian Jin, Heyang Xu, Yazhou Yao, Errui Ding ...

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Episode 312 · · 22:45

πŸ€— Upvotes: 29 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize C...

Bringing Objects to Life: 4D generation from 3D objects

Bringing Objects to Life: 4D generation from 3D objects

Episode 311 · · 21:48

πŸ€— Upvotes: 24 | cs.CV Authors: Ohad Rahamim, Ori Malca, Dvir Samuel, Gal Chechik Title: Bringing Objects to Life: 4D ge...

Efficiently Serving LLM Reasoning Programs with Certaindex

Efficiently Serving LLM Reasoning Programs with Certaindex

Episode 310 · · 20:19

πŸ€— Upvotes: 20 | cs.LG, cs.CL Authors: Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Aurick Qiao, Hao Zhang Title: ...

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Episode 309 · · 21:15

πŸ€— Upvotes: 14 | cs.SD, cs.AI, cs.CL, eess.AS Authors: Chia-Yu Hung, Navonil Majumder, Zhifeng Kong, Ambuj Mehrish, Rafael Valle, Bryan Catanzaro...

Edicho: Consistent Image Editing in the Wild

Edicho: Consistent Image Editing in the Wild

Episode 308 · · 22:47

πŸ€— Upvotes: 13 | cs.CV Authors: Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen ...

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Episode 307 · · 22:12

πŸ€— Upvotes: 6 | cs.CL, cs.AI Authors: Mikhail Tikhomirov, Daniil Chernyshev Title: Facilitating large language model Rus...

Training Software Engineering Agents and Verifiers with SWE-Gym

Training Software Engineering Agents and Verifiers with SWE-Gym

Episode 306 · · 26:54

πŸ€— Upvotes: 6 | cs.SE, cs.CL Authors: Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang Title...

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Episode 305 · · 20:54

πŸ€— Upvotes: 5 | cs.SE, cs.CL Authors: Zhaojian Yu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang Title: HumanEval Pro and MBP...

Slow Perception: Let's Perceive Geometric Figures Step-by-step

Slow Perception: Let's Perceive Geometric Figures Step-by-step

Episode 304 · · 23:19

πŸ€— Upvotes: 5 | cs.CV Authors: Haoran Wei, Youyang Yin, Yumeng Li, Jia Wang, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang Title...

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Episode 303 · · 23:19

πŸ€— Upvotes: 53 | cs.CL, cs.AI, cs.LG Authors: Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Jianye Hou, Benyou Wan...

1.58-bit FLUX

1.58-bit FLUX

Episode 302 · · 22:59

πŸ€— Upvotes: 24 | cs.CV, cs.AI, cs.LG Authors: Chenglin Yang, Celong Liu, Xueqing Deng, Dongwon Kim, Xing Mei, Xiaohui Shen, Liang-Chieh Chen ...

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Episode 301 · · 17:30

πŸ€— Upvotes: 17 | cs.CL, cs.AI, cs.CV, cs.LG, cs.MM, eess.AS Authors: Liang Chen, Zekun Wang, Shuhuai Ren, Lei Li, Haozhe Zhao, Yunshui Li, Zefan ...

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Episode 300 · · 23:16

πŸ€— Upvotes: 11 | cs.CV Authors: Zehan Wang, Ziang Zhang, Tianyu Pang, Chao Du, Hengshuang Zhao, Zhou Zhao Title: Orient ...

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Episode 299 · · 25:03

πŸ€— Upvotes: 11 | cs.CV Authors: Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qi...

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Episode 298 · · 22:38

πŸ€— Upvotes: 11 | cs.CV Authors: Jiawei Lin, Shizhao Sun, Danqing Huang, Ting Liu, Ji Li, Jiang Bian Title: From Elements...

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

Episode 297 · · 23:53

πŸ€— Upvotes: 8 | cs.CV Authors: Tao Wu, Yong Zhang, Xiaodong Cun, Zhongang Qi, Junfu Pu, Huanzhang Dou, Guangcong Zheng, Ying Shan, Xi Li ...

The Superposition of Diffusion Models Using the ItΓ΄ Density Estimator

The Superposition of Diffusion Models Using the ItΓ΄ Density Estimator

Episode 296 · · 23:22

πŸ€— Upvotes: 8 | cs.LG Authors: Marta Skreta, Lazar Atanackovic, Avishek Joey Bose, Alexander Tong, Kirill Neklyudov Title: ...

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging

Episode 295 · · 19:03

πŸ€— Upvotes: 6 | cs.CL Authors: Hua Farn, Hsuan Su, Shachi H Kumar, Saurav Sahay, Shang-Tse Chen, Hung-yi Lee Title: Safe...

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

Episode 294 · · 25:00

πŸ€— Upvotes: 3 | cs.CL, cs.AI, cs.DB Authors: Yanlin Feng, Simone Papicchio, Sajjadur Rahman Title: CypherBench: Towards ...

YuLan-Mini: An Open Data-efficient Language Model

YuLan-Mini: An Open Data-efficient Language Model

Episode 293 · · 19:39

πŸ€— Upvotes: 27 | cs.CL Authors: Yiwen Hu, Huatong Song, Jia Deng, Jiapeng Wang, Jie Chen, Kun Zhou, Yutao Zhu, Jinhao Jiang, Zican Dong, Wayne Xi...

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

Episode 292 · · 21:47

πŸ€— Upvotes: 17 | cs.CL Authors: Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou Title: ...

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Episode 291 · · 21:10

πŸ€— Upvotes: 4 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Wan-Cyuan Fan, Tanzila Rahman, Leonid Sigal Title: MMFactory: A Univ...

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

Episode 290 · · 22:18

πŸ€— Upvotes: 2 | cs.IR, cs.AI Authors: Yucong Luo, Qitao Qin, Hao Zhang, Mingyue Cheng, Ruiran Yan, Kefan Wang, Jie Ouyang Title: ...

DepthLab: From Partial to Complete

DepthLab: From Partial to Complete

Episode 289 · · 22:07

πŸ€— Upvotes: 21 | cs.CV Authors: Zhiheng Liu, Ka Leong Cheng, Qiuyu Wang, Shuzhe Wang, Hao Ouyang, Bin Tan, Kai Zhu, Yujun Shen, Qifeng Chen, Ping...

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Episode 288 · · 21:34

πŸ€— Upvotes: 20 | cs.AI, cs.CL Authors: Ermo Hua, Che Jiang, Xingtai Lv, Kaiyan Zhang, Ning Ding, Youbang Sun, Biqing Qi, Yuchen Fan, Xue Kai Zhu,...

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Episode 287 · · 22:13

πŸ€— Upvotes: 10 | cs.CV, cs.AI, cs.MM Authors: Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yu...

In Case You Missed It: ARC 'Challenge' Is Not That Challenging

In Case You Missed It: ARC 'Challenge' Is Not That Challenging

Episode 286 · · 24:19

πŸ€— Upvotes: 8 | cs.CL, cs.AI Authors: Łukasz Borchmann Title: In Case You Missed It: ARC 'Challenge' Is Not That Challen...

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Episode 285 · · 20:56

πŸ€— Upvotes: 8 | cs.LG Authors: Ziteng Wang, Jianfei Chen, Jun Zhu Title: ReMoE: Fully Differentiable Mixture-of-Experts ...

SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Episode 284 · · 22:17

πŸ€— Upvotes: 6 | cs.CL Authors: Aakash Mahalingam, Vinesh Kumar Gande, Aman Chadha, Vinija Jain, Divya Chaudhary Title: S...