Episodes

Latest Episode
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Episode 1400 · · 25:10

πŸ€— Upvotes: 44 | cs.CV, cs.AI Authors: Zehong Ma, Longhui Wei, Shuai Wang, Shiliang Zhang, Qi Tian Title: DeCo: Frequenc...

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Episode 1399 · · 20:31

πŸ€— Upvotes: 42 | cs.CL, cs.AI, cs.LG Authors: Rulin Shao, Akari Asai, Shannon Zejiang Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran ...

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Episode 1398 · · 22:20

πŸ€— Upvotes: 32 | cs.CV Authors: Tian Ye, Song Fei, Lei Zhu Title: UltraFlux: Data-Model Co-Design for High-quality Nativ...

In-Video Instructions: Visual Signals as Generative Control

In-Video Instructions: Visual Signals as Generative Control

Episode 1397 · · 22:26

πŸ€— Upvotes: 26 | cs.CV, cs.AI Authors: Gongfan Fang, Xinyin Ma, Xinchao Wang Title: In-Video Instructions: Visual Signal...

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Episode 1396 · · 21:24

πŸ€— Upvotes: 76 | cs.AI, cs.CL Authors: Kaichen Zhang, Keming Wu, Zuhao Yang, Kairui Hu, Bin Wang, Ziwei Liu, Xingxuan Li, Lidong Bing ...

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story

Episode 1395 · · 22:13

πŸ€— Upvotes: 72 | cs.CL, cs.AI, cs.LG Authors: Vladislav Pedashenko, Laida Kushnareva, Yana Khassan Nibal, Eduard Tulchinskii, Kristian Kuznetsov,...

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Episode 1394 · · 21:40

πŸ€— Upvotes: 60 | cs.CV Authors: Yikun Wang, Zuyan Liu, Ziyi Wang, Pengfei Liu, Han Hu, Yongming Rao Title: GeoVista: Web...

SAM 3: Segment Anything with Concepts

SAM 3: Segment Anything with Concepts

Episode 1393 · · 23:53

πŸ€— Upvotes: 51 | cs.CV, cs.AI Authors: Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali...

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Episode 1392 · · 25:59

πŸ€— Upvotes: 66 | cs.CV, cs.AI Authors: Cheng Yang, Haiyuan Wan, Yiran Peng, Xin Cheng, Zhaoyang Yu, Jiayi Zhang, Junchi Yu, Xinlei Yu, Xiawu Zhen...

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Episode 1391 · · 24:57

πŸ€— Upvotes: 128 | cs.CV, cs.AI, cs.LG Authors: Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev...

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Episode 1390 · · 22:40

πŸ€— Upvotes: 47 | cs.AI Authors: Alexis Audran-Reiss, Jordi Armengol EstapΓ©, Karen Hambardzumyan, Amar Budhiraja, Martin Josifoski, Edan Toledo, R...

VisPlay: Self-Evolving Vision-Language Models from Images

VisPlay: Self-Evolving Vision-Language Models from Images

Episode 1389 · · 22:28

πŸ€— Upvotes: 31 | cs.CV, cs.AI, cs.CL, cs.LG Authors: Yicheng He, Chengsong Huang, Zongxia Li, Jiaxin Huang, Yonghui Yang Title: ...

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

Episode 1388 · · 19:22

πŸ€— Upvotes: 23 | cs.CV Authors: Geon Choi, Hangyul Yoon, Hyunju Shin, Hyunki Park, Sang Hoon Seo, Eunho Yang, Edward Choi Title: ...

VIDEOP2R: Video Understanding from Perception to Reasoning

VIDEOP2R: Video Understanding from Perception to Reasoning

Episode 1387 · · 25:08

πŸ€— Upvotes: 70 | cs.CV, cs.AI, cs.LG Authors: Yifan Jiang, Yueying Wang, Rui Zhao, Toufiq Parag, Zhimin Chen, Zhenyu Liao, Jayakrishnan Unnikrish...

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Episode 1386 · · 24:58

πŸ€— Upvotes: 66 | cs.CL, cs.AI, cs.LG, cs.PF Authors: Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang Title: ...

AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

Episode 1385 · · 23:48

πŸ€— Upvotes: 58 | cs.CL, cs.AI, cs.LG Authors: Mohammad Zbib, Hasan Abed Al Kader Hammoud, Sina Mukalled, Nadine Rizk, Fatima Karnib, Issam Lakkis...

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Episode 1384 · · 23:48

πŸ€— Upvotes: 41 | cs.CV, cs.AI Authors: Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang Title: A St...

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Episode 1383 · · 22:39

πŸ€— Upvotes: 32 | cs.CV Authors: Xinxin Liu, Zhaopan Xu, Kai Wang, Yong Jae Lee, Yuzhang Shang Title: Can World Simulator...

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

Episode 1382 · · 24:27

πŸ€— Upvotes: 24 | cs.CV Authors: Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, Lu Cheng Title: ...

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Episode 1381 · · 26:47

πŸ€— Upvotes: 22 | cs.CV Authors: Jiaze Li, Hao Yin, Wenhui Tan, Jingyang Chen, Boshen Xu, Yuxun Qu, Yijing Chen, Jianzhong Ju, Zhenbo Luo, Jian Lu...