Episodes

Latest Episode
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Episode 1681 · · 24:06

🤗 Upvotes: 112 | cs.CV Authors: Hejun Dong, Junbo Niu, Bin Wang, Weijun Zeng, Wentao Zhang, Conghui He Title: MinerU-Di...

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Episode 1680 · · 22:14

🤗 Upvotes: 69 | cs.CV Authors: Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang Title: ...

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Episode 1679 · · 27:57

🤗 Upvotes: 43 | cs.AI, cs.CL Authors: Ling Yue, Kushal Raj Bhandari, Ching-Yun Ko, Dhaval Patel, Shuxin Lin, Nianjun Zhou, Jianxi Gao, Pin-Yu Ch...

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Episode 1678 · · 21:53

🤗 Upvotes: 43 | cs.CV, cs.CL Authors: Haoyu Huang, Jinfa Huang, Zhongwei Wan, Xiawu Zheng, Rongrong Ji, Jiebo Luo Title: ...

PEARL: Personalized Streaming Video Understanding Model

PEARL: Personalized Streaming Video Understanding Model

Episode 1677 · · 22:20

🤗 Upvotes: 36 | cs.CV, cs.AI, cs.IR Authors: Yuanhong Zheng, Ruichuan An, Xiaopeng Lin, Yuxing Liu, Sihan Yang, Huanyu Zhang, Haodong Li, Qinton...

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models

Episode 1676 · · 20:42

🤗 Upvotes: 36 | cs.CV Authors: Jaewon Min, Jaeeun Lee, Yeji Choi, Paul Hyunbin Cho, Jin Hyeon Kim, Tae-Young Lee, Jongsik Ahn, Hwayeong Lee, Seo...

SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

Episode 1675 · · 23:43

🤗 Upvotes: 33 | cs.CV, cs.GR, cs.RO Authors: Chuanrui Zhang, Minghan Qin, Yuang Wang, Baifeng Xie, Hang Li, Ziwei Wang Title: ...

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Episode 1674 · · 20:15

🤗 Upvotes: 30 | cs.CV Authors: Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Hua...

RealMaster: Lifting Rendered Scenes into Photorealistic Video

RealMaster: Lifting Rendered Scenes into Photorealistic Video

Episode 1673 · · 22:32

🤗 Upvotes: 23 | cs.CV Authors: Dana Cohen-Bar, Ido Sobol, Raphael Bensadoun, Shelly Sheynin, Oran Gafni, Or Patashnik, Daniel Cohen-Or, Amit Zoh...

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models

Episode 1672 · · 23:04

🤗 Upvotes: 110 | cs.CV Authors: Meiqi Wu, Zhixin Cai, Fufangchen Zhao, Xiaokun Feng, Rujing Dang, Bingze Song, Ruitian Tian, Jiashu Zhu, Jiachen...

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Episode 1671 · · 22:55

🤗 Upvotes: 90 | cs.CV Authors: SII-GAIR, Sand. ai, :, Ethan Chern, Hansi Teng, Hanwen Sun, Hao Wang, Hong Pan, Hongyu Jia, Jiadi Su, Jin Li, Jun...

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Episode 1670 · · 22:54

🤗 Upvotes: 63 | cs.AI, cs.CL Authors: Jianing Wang, Jianfei Zhang, Qi Guo, Linsen Guo, Rumei Li, Chao Zhang, Chong Peng, Cunguang Wang, Dengchan...

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Episode 1669 · · 26:12

🤗 Upvotes: 60 | cs.CV, cs.AI Authors: Nimrod Shabtay, Moshe Kimhi, Artem Spector, Sivan Haray, Ehud Rivlin, Chaim Baskin, Raja Giryes, Eli Schwa...

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Episode 1668 · · 24:15

🤗 Upvotes: 55 | cs.IR, cs.AI, cs.CL Authors: Zhuofeng Li, Dongfu Jiang, Xueguang Ma, Haoxiang Zhang, Ping Nie, Yuyu Zhang, Kai Zou, Jianwen Xie,...

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

Episode 1667 · · 22:39

🤗 Upvotes: 45 | cs.CV Authors: Ruoliu Yang, Chu Wu, Caifeng Shan, Ran He, Chaoyou Fu Title: VideoDetective: Clue Huntin...

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

Episode 1666 · · 22:51

🤗 Upvotes: 39 | cs.CV Authors: Byungwoo Jeon, Dongyoung Kim, Huiwon Jang, Insoo Kim, Jinwoo Shin Title: SpatialBoost: E...

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

Episode 1665 · · 22:49

🤗 Upvotes: 31 | cs.CV Authors: Injae Kim, Chaehyeon Kim, Minseong Bae, Minseok Joo, Hyunwoo J. Kim Title: F4Splat: Feed...

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT

mSFT: Addressing Dataset Mixtures Overfitting Heterogeneously in Multi-task SFT

Episode 1664 · · 24:27

🤗 Upvotes: 28 | cs.LG, cs.AI Authors: Woosung Koh, Jeyoung Jeon, Youngjin Song, Yujin Cheon, Soowon Oh, Jaehyeong Choi, Se-Young Yun ...

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Episode 1663 · · 24:36

🤗 Upvotes: 96 | cs.CV, cs.AI, cs.CL Authors: Shenzhi Wang, Shixuan Liu, Jing Zhou, Chang Gao, Xiong-Hui Chen, Binghai Wang, An Yang, Shiji Song,...

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Episode 1662 · · 24:19

🤗 Upvotes: 87 | cs.CV Authors: Songchun Zhang, Zeyue Xue, Siming Fu, Jie Huang, Xianghao Kong, Y Ma, Haoyang Huang, Nan Duan, Anyi Rao ...