Episodes

Latest Episode
DDT: Decoupled Diffusion Transformer

DDT: Decoupled Diffusion Transformer

Episode 665 · · 19:49

πŸ€— Upvotes: 51 | cs.CV, cs.AI Authors: Shuai Wang, Zhi Tian, Weilin Huang, Limin Wang Title: DDT: Decoupled Diffusion Tr...

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Episode 664 · · 20:51

πŸ€— Upvotes: 43 | cs.CL Authors: Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bisch...

A Unified Agentic Framework for Evaluating Conditional Image Generation

A Unified Agentic Framework for Evaluating Conditional Image Generation

Episode 663 · · 21:06

πŸ€— Upvotes: 25 | cs.CV, cs.CL Authors: Jifang Wang, Xue Yang, Longyue Wang, Zhenran Xu, Yiyu Wang, Yaowei Wang, Weihua Luo, Kaifu Zhang, Baotian ...

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Episode 662 · · 23:13

πŸ€— Upvotes: 24 | cs.AI, cs.CL, cs.LG Authors: Chenrui Fan, Ming Li, Lichao Sun, Tianyi Zhou Title: Missing Premise exace...

OmniSVG: A Unified Scalable Vector Graphics Generation Model

OmniSVG: A Unified Scalable Vector Graphics Generation Model

Episode 661 · · 21:36

πŸ€— Upvotes: 91 | cs.CV Authors: Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, Yu-Gang Jiang ...

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Episode 660 · · 24:23

πŸ€— Upvotes: 73 | cs.LG, cs.CL Authors: Gleb Rodionov, Roman Garipov, Alina Shutova, George Yakushev, Vage Egiazarian, Anton Sinitsin, Denis Kuzne...

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Episode 659 · · 23:07

πŸ€— Upvotes: 62 | cs.CV, cs.CL Authors: Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tian...

An Empirical Study of GPT-4o Image Generation Capabilities

An Empirical Study of GPT-4o Image Generation Capabilities

Episode 658 · · 22:27

πŸ€— Upvotes: 50 | cs.CV Authors: Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Cha...

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Episode 657 · · 21:42

πŸ€— Upvotes: 36 | cs.CL Authors: M-A-P Team, Siwei Wu, Jincheng Ren, Xinrun Du, Shuyue Guo, Xingwei Qu, Yiming Liang, Jie Liu, Yunwen Li, Tianyu Z...

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

Episode 656 · · 21:12

πŸ€— Upvotes: 27 | cs.CV, cs.LG Authors: Shaojin Wu, Mengqi Huang, Wenxu Wu, Yufeng Cheng, Fei Ding, Qian He Title: Less-t...

SmolVLM: Redefining small and efficient multimodal models

SmolVLM: Redefining small and efficient multimodal models

Episode 655 · · 25:38

πŸ€— Upvotes: 96 | cs.AI, cs.CV Authors: AndrΓ©s Marafioti, Orr Zohar, Miquel FarrΓ©, Merve Noyan, Elie Bakouch, Pedro Cuenca, Cyril Zakka, Loubna Be...

One-Minute Video Generation with Test-Time Training

One-Minute Video Generation with Test-Time Training

Episode 654 · · 18:49

πŸ€— Upvotes: 61 | cs.CV Authors: Karan Dalal, Daniel Koceja, Gashon Hussein, Jiarui Xu, Yue Zhao, Youjin Song, Shihao Han, Ka Chun Cheung, Jan Kau...

Rethinking Reflection in Pre-Training

Rethinking Reflection in Pre-Training

Episode 653 · · 21:39

πŸ€— Upvotes: 52 | cs.CL, cs.AI Authors: Essential AI, :, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ash...

URECA: Unique Region Caption Anything

URECA: Unique Region Caption Anything

Episode 652 · · 21:41

πŸ€— Upvotes: 31 | cs.CV, cs.AI Authors: Sangbeom Lim, Junwan Kim, Heeji Yoon, Jaewoo Jung, Seungryong Kim Title: URECA: U...

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Episode 651 · · 20:45

πŸ€— Upvotes: 29 | cs.CL, cs.AI Authors: Minki Kang, Jongwon Jeong, Jaewoong Cho Title: T1: Tool-integrated Self-verificat...

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Episode 650 · · 25:44

πŸ€— Upvotes: 30 | cs.SE, cs.AI, cs.CL Authors: Daoguang Zan, Zhirong Huang, Wei Liu, Hanwu Chen, Linhao Zhang, Shulin Xin, Lu Chen, Qi Liu, Xiaoji...

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Episode 649 · · 20:47

πŸ€— Upvotes: 98 | cs.AI Authors: Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song,...

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Episode 648 · · 23:04

πŸ€— Upvotes: 55 | cs.CV Authors: Xiangyu Zhao, Peiyuan Zhang, Kexian Tang, Hao Li, Zicheng Zhang, Guangtao Zhai, Junchi Yan, Hua Yang, Xue Yang, H...

ZClip: Adaptive Spike Mitigation for LLM Pre-Training

ZClip: Adaptive Spike Mitigation for LLM Pre-Training

Episode 647 · · 20:18

πŸ€— Upvotes: 47 | cs.LG, cs.CL Authors: Abhay Kumar, Louis Owen, Nilabhra Roy Chowdhury, Fabian GΓΌra Title: ZClip: Adapti...

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Episode 646 · · 21:35

πŸ€— Upvotes: 34 | cs.CV Authors: Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li...

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Episode 645 · · 22:34

πŸ€— Upvotes: 24 | cs.LG, cs.CL, cs.CV Authors: Yan Ma, Steffi Chern, Xuyang Shen, Yiran Zhong, Pengfei Liu Title: Rethink...

WikiVideo: Article Generation from Multiple Videos

WikiVideo: Article Generation from Multiple Videos

Episode 644 · · 21:32

πŸ€— Upvotes: 24 | cs.CV, cs.CL Authors: Alexander Martin, Reno Kriz, William Gantt Walden, Kate Sanders, Hannah Recknor, Eugene Yang, Francis Ferr...

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

Episode 643 · · 19:52

πŸ€— Upvotes: 57 | cs.CV, cs.AI Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan ...

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

Episode 642 · · 23:23

πŸ€— Upvotes: 30 | cs.CV Authors: Junhao Cheng, Yuying Ge, Yixiao Ge, Jing Liao, Ying Shan Title: AnimeGamer: Infinite Ani...

Understanding R1-Zero-Like Training: A Critical Perspective

Understanding R1-Zero-Like Training: A Critical Perspective

Episode 641 · · 19:55

πŸ€— Upvotes: 25 | cs.LG, cs.AI, cs.CL Authors: Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin ...

Towards Physically Plausible Video Generation via VLM Planning

Towards Physically Plausible Video Generation via VLM Planning

Episode 640 · · 22:25

πŸ€— Upvotes: 25 | cs.CV, cs.AI Authors: Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin ...

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

Episode 639 · · 20:55

πŸ€— Upvotes: 24 | cs.CV, cs.AI Authors: Yuxuan Luo, Zhengkun Rong, Lizhen Wang, Longhao Zhang, Tianshu Hu, Yongming Zhu Title: ...

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

Episode 638 · · 21:49

πŸ€— Upvotes: 22 | cs.CV Authors: Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan Title: VideoScene: Distilling Video Dif...

START: Self-taught Reasoner with Tools

START: Self-taught Reasoner with Tools

Episode 637 · · 24:43

πŸ€— Upvotes: 49 | cs.CL Authors: Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Xiang Wang, Bowen Yu, Binyuan Hui, Junyang L...

Token-Efficient Long Video Understanding for Multimodal LLMs

Token-Efficient Long Video Understanding for Multimodal LLMs

Episode 636 · · 21:18

πŸ€— Upvotes: 41 | cs.CV Authors: Jindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen, Zhiqi Li, De-An Huang, Guilin Liu, Zhiding Yu, Kurt Ke...