Episodes

Latest Episode
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Episode 1293 · · 22:32

🤗 Upvotes: 133 | cs.RO Authors: Fuhao Li, Wenxuan Song, Han Zhao, Jingbo Wang, Pengxiang Ding, Donglin Wang, Long Zeng, Haoang Li T...

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Episode 1292 · · 23:30

🤗 Upvotes: 91 | cs.CV Authors: Jiachen Lei, Keli Liu, Julius Berner, Haiming Yu, Hongkai Zheng, Jiahong Wu, Xiangxiang Chu Title: ...

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

Episode 1291 · · 21:58

🤗 Upvotes: 90 | cs.CL Authors: Enze Zhang, Jiaying Wang, Mengxi Xiao, Jifei Liu, Ziyan Kuang, Rui Dong, Eric Dong, Sophia Ananiadou, Min Peng, Q...

Scaling Language-Centric Omnimodal Representation Learning

Scaling Language-Centric Omnimodal Representation Learning

Episode 1290 · · 28:03

🤗 Upvotes: 82 | cs.CL, cs.AI, cs.CV Authors: Chenghao Xiao, Hou Pong Chan, Hao Zhang, Weiwen Xu, Mahani Aljunied, Yu Rong Title: ...

Robot Learning: A Tutorial

Robot Learning: A Tutorial

Episode 1289 · · 23:41

🤗 Upvotes: 44 | cs.RO, cs.LG Authors: Francesco Capuano, Caroline Pascal, Adil Zouitine, Thomas Wolf, Michel Aractingi Title: ...

Detect Anything via Next Point Prediction

Detect Anything via Next Point Prediction

Episode 1288 · · 23:00

🤗 Upvotes: 34 | cs.CV Authors: Qing Jiang, Junan Huo, Xingyu Chen, Yuda Xiong, Zhaoyang Zeng, Yihao Chen, Tianhe Ren, Junzhi Yu, Lei Zhang ...

A Survey of Vibe Coding with Large Language Models

A Survey of Vibe Coding with Large Language Models

Episode 1287 · · 22:36

🤗 Upvotes: 31 | cs.AI Authors: Yuyao Ge, Lingrui Mei, Zenghao Duan, Tianhao Li, Yujia Zheng, Yiwei Wang, Lexin Wang, Jiayu Yao, Tianyu Liu, Yuju...

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

Episode 1286 · · 23:41

🤗 Upvotes: 30 | cs.CV Authors: Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue Title: Fl...

Dr.LLM: Dynamic Layer Routing in LLMs

Dr.LLM: Dynamic Layer Routing in LLMs

Episode 1285 · · 23:50

🤗 Upvotes: 27 | cs.CL, cs.AI, cs.LG Authors: Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh Title: ...

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models

Episode 1284 · · 20:48

🤗 Upvotes: 26 | cs.LG, cs.AI Authors: Youngrok Park, Hojung Jung, Sangmin Bae, Se-Young Yun Title: Temporal Alignment G...

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Episode 1283 · · 24:17

🤗 Upvotes: 106 | cs.LG, cs.CL, cs.CV Authors: Wei Huang, Yi Ge, Shuai Yang, Yicheng Xiao, Huizi Mao, Yujun Lin, Hanrong Ye, Sifei Liu, Ka Chun C...

Diffusion Transformers with Representation Autoencoders

Diffusion Transformers with Representation Autoencoders

Episode 1282 · · 24:28

🤗 Upvotes: 93 | cs.CV, cs.LG Authors: Boyang Zheng, Nanye Ma, Shengbang Tong, Saining Xie Title: Diffusion Transformers...

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Episode 1281 · · 26:46

🤗 Upvotes: 39 | cs.AI Authors: Caorui Li, Yu Chen, Yiyan Ji, Jin Xu, Zhenyu Cui, Shihao Li, Yuanxing Zhang, Jiafu Tang, Zhenghao Song, Dingling ...

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States

Episode 1280 · · 25:11

🤗 Upvotes: 37 | cs.CL Authors: Qinglin Zhu, Yizhen Yao, Runcong Zhao, Yanzheng Xiang, Amrutha Saseendran, Chen Jin, Philip Alexander Teare, Bin ...

Spotlight on Token Perception for Multimodal Reinforcement Learning

Spotlight on Token Perception for Multimodal Reinforcement Learning

Episode 1279 · · 23:52

🤗 Upvotes: 31 | cs.CV Authors: Siyuan Huang, Xiaoye Qu, Yafu Li, Yun Luo, Zefeng He, Daizong Liu, Yu Cheng Title: Spotl...

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Episode 1278 · · 24:01

🤗 Upvotes: 31 | cs.LG, cs.AI, cs.CL Authors: Jinghao Zhang, Naishan Zheng, Ruilin Li, Dongzhou Cheng, Zheming Liang, Feng Zhao, Jiaqi Wang ...

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Episode 1277 · · 22:13

🤗 Upvotes: 26 | cs.CV Authors: Haoran Feng, Dizhe Zhang, Xiangtai Li, Bo Du, Lu Qi Title: DiT360: High-Fidelity Panoram...

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Episode 1276 · · 24:24

🤗 Upvotes: 26 | cs.CV Authors: Xinlong Chen, Yue Ding, Weihong Lin, Jingyun Hua, Linli Yao, Yang Shi, Bozhou Li, Yuanxing Zhang, Qiang Liu, Peng...

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Episode 1275 · · 25:21

🤗 Upvotes: 25 | cs.CV Authors: Haomin Wang, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang,...

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Episode 1274 · · 21:38

🤗 Upvotes: 25 | cs.CL, cs.AI Authors: Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping...