Episodes

Latest Episode
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Episode 208 · · 18:51

πŸ€— Upvotes: 16 | cs.CL Authors: Yiheng Xu, Dunjie Lu, Zhennan Shen, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu Title:...

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Episode 207 · · 19:08

πŸ€— Upvotes: 14 | cs.CV Authors: Dongting Hu, Jierun Chen, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri,...

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

Episode 206 · · 22:37

πŸ€— Upvotes: 13 | cs.CV Authors: Zexin He, Tengfei Wang, Xin Huang, Xingang Pan, Ziwei Liu Title: Neural LightRig: Unlock...

JuStRank: Benchmarking LLM Judges for System Ranking

JuStRank: Benchmarking LLM Judges for System Ranking

Episode 205 · · 21:10

πŸ€— Upvotes: 9 | cs.CL, cs.AI, cs.LG Authors: Ariel Gera, Odellia Boni, Yotam Perlitz, Roy Bar-Haim, Lilach Eden, Asaf Yehudai Title:...

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Episode 204 · · 21:09

πŸ€— Upvotes: 36 | cs.CV Authors: Jianhong Bai, Menghan Xia, Xintao Wang, Ziyang Yuan, Xiao Fu, Zuozhu Liu, Haoji Hu, Pengfei Wan, Di Zhang ...

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Episode 203 · · 21:28

πŸ€— Upvotes: 28 | cs.CV Authors: Zejian Li, Chenye Meng, Yize Li, Ling Yang, Shengyuan Zhang, Jiarui Ma, Jiayi Li, Guang Yang, Changyuan Yang, Zhi...

POINTS1.5: Building a Vision-Language Model towards Real World Applications

POINTS1.5: Building a Vision-Language Model towards Real World Applications

Episode 202 · · 24:23

πŸ€— Upvotes: 25 | cs.CV, cs.MM Authors: Yuan Liu, Le Tian, Xiao Zhou, Xinyu Gao, Kavio Yu, Yang Yu, Jie Zhou Title: POINT...

Learning Flow Fields in Attention for Controllable Person Image Generation

Learning Flow Fields in Attention for Controllable Person Image Generation

Episode 201 · · 21:04

πŸ€— Upvotes: 16 | cs.CV Authors: Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manue...

StyleMaster: Stylize Your Video with Artistic Generation and Translation

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Episode 200 · · 23:20

πŸ€— Upvotes: 14 | cs.CV Authors: Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo Title: StyleMas...

StreamChat: Chatting with Streaming Video

StreamChat: Chatting with Streaming Video

Episode 199 · · 19:44

πŸ€— Upvotes: 12 | cs.CV Authors: Jihao Liu, Zhiding Yu, Shiyi Lan, Shihao Wang, Rongyao Fang, Jan Kautz, Hongsheng Li, Jose M. Alvare ...

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Episode 198 · · 25:04

πŸ€— Upvotes: 11 | cs.CV Authors: Wufei Ma, Haoyu Chen, Guofeng Zhang, Celso M de Melo, Alan Yuille, Jieneng Chen Title: 3...

Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction

Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction

Episode 197 · · 22:43

πŸ€— Upvotes: 11 | cs.CV, cs.GR Authors: Seungtae Nam, Xiangyu Sun, Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park Title: ...

The BrowserGym Ecosystem for Web Agent Research

The BrowserGym Ecosystem for Web Agent Research

Episode 196 · · 25:16

πŸ€— Upvotes: 11 | cs.LG, cs.AI, cs.SE Authors: Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, LΓ©o Boisvert, Meg...

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Episode 195 · · 22:09

πŸ€— Upvotes: 31 | cs.CV Authors: Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong Title: DiffS...

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Episode 194 · · 21:29

πŸ€— Upvotes: 20 | cs.CV, cs.AI, cs.LG Authors: Kasra Arabi, Benjamin Feuer, R. Teal Witter, Chinmay Hegde, Niv Cohen Title: ...

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Episode 193 · · 19:48

πŸ€— Upvotes: 19 | cs.CV Authors: Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein ...

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Episode 192 · · 23:56

πŸ€— Upvotes: 18 | cs.CV Authors: Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin...

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

Episode 191 · · 23:46

πŸ€— Upvotes: 17 | cs.CV Authors: Xiao Fu, Xian Liu, Xintao Wang, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin...

Mobile Video Diffusion

Mobile Video Diffusion

Episode 190 · · 24:39

πŸ€— Upvotes: 16 | cs.CV, cs.AI Authors: Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir Ghodrati, Amirhossein Habibian Titl...

Granite Guardian

Granite Guardian

Episode 189 · · 21:00

πŸ€— Upvotes: 16 | cs.CL Authors: Inkit Padhi, Manish Nagireddy, Giandomenico Cornacchia, Subhajit Chaudhury, Tejaswini Pedapati, Pierre Dognin, Ke...

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Episode 188 · · 18:54

πŸ€— Upvotes: 54 | cs.LG, cs.AI Authors: Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev, Aleksandr I. Panov Title: ...

ProcessBench: Identifying Process Errors in Mathematical Reasoning

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Episode 187 · · 21:22

πŸ€— Upvotes: 38 | cs.AI, cs.CL, cs.LG Authors: Chujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu, Dayiheng Liu, Jingren Zh...

Training Large Language Models to Reason in a Continuous Latent Space

Training Large Language Models to Reason in a Continuous Latent Space

Episode 186 · · 22:02

πŸ€— Upvotes: 25 | cs.CL Authors: Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian Title: ...

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Episode 185 · · 23:51

πŸ€— Upvotes: 10 | cs.CV Authors: Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan Title: Divot: Diffusion Powers Video Tokenize...

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Episode 184 · · 22:19

πŸ€— Upvotes: 9 | cs.CV, cs.LG Authors: Nicolas Dufour, David Picard, Vicky Kalogeiton, Loic Landrieu Title: Around the Wo...

Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Episode 183 · · 22:31

πŸ€— Upvotes: 8 | cs.CV, cs.CL, cs.LG Authors: Xiao Xu, Tianhao Niu, Yuxi Xie, Libo Qin, Wanxiang Che, Min-Yen Kan Title: ...

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Episode 182 · · 19:54

πŸ€— Upvotes: 7 | cs.CV Authors: Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu Tang, Xinlong Wang Title: ...

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Episode 181 · · 20:24

πŸ€— Upvotes: 7 | cs.CV, cs.AI, cs.IR Authors: Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man ...

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

Episode 180 · · 17:59

πŸ€— Upvotes: 5 | cs.AI Authors: Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, Hang Li Title: Robust Multi-bit Text Wat...

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views

Episode 179 · · 22:08

πŸ€— Upvotes: 4 | cs.CV, cs.GR Authors: Antoine GuΓ©don, Tomoki Ichikawa, Kohei Yamashita, Ko Nishino Title: MAtCha Gaussia...