Episodes

Latest Episode
One Diffusion to Generate Them All

One Diffusion to Generate Them All

Episode 141 · · 23:25

πŸ€— Paper Upvotes: 13 | cs.CV, cs.AI Authors: Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Kri...

VisualLens: Personalization through Visual History

VisualLens: Personalization through Visual History

Episode 140 · · 24:45

πŸ€— Paper Upvotes: 13 | cs.CV Authors: Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue ...

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Episode 139 · · 26:23

πŸ€— Paper Upvotes: 38 | cs.CL Authors: Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester Jame...

Style-Friendly SNR Sampler for Style-Driven Generation

Style-Friendly SNR Sampler for Style-Driven Generation

Episode 138 · · 20:05

πŸ€— Paper Upvotes: 28 | cs.CV Authors: Jooyoung Choi, Chaehun Shin, Yeongtak Oh, Heeseung Kim, Sungroh Yoon Title: Style-...

OminiControl: Minimal and Universal Control for Diffusion Transformer

OminiControl: Minimal and Universal Control for Diffusion Transformer

Episode 137 · · 26:03

πŸ€— Paper Upvotes: 22 | cs.CV, cs.AI, cs.LG Authors: Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang Title: ...

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Episode 136 · · 23:35

πŸ€— Paper Upvotes: 15 | cs.CL, cs.LG, 68T50, I.2.7 Authors: Gabriel Chua, Shing Yee Chan, Shaun Khoo Title: A Flexible La...

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Episode 135 · · 27:13

πŸ€— Paper Upvotes: 14 | cs.AI Authors: Davide Paglieri, BartΕ‚omiej CupiaΕ‚, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pi...

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Episode 134 · · 22:56

πŸ€— Paper Upvotes: 12 | cs.CV, cs.CL Authors: Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu Title: Large Multi-modal Models...

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Episode 133 · · 21:27

πŸ€— Paper Upvotes: 9 | cs.CV, cs.AI, cs.CL Authors: Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue...

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Episode 132 · · 26:01

πŸ€— Paper Upvotes: 9 | cs.CV, cs.AI, cs.LG Authors: Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo Title: ...

MyTimeMachine: Personalized Facial Age Transformation

MyTimeMachine: Personalized Facial Age Transformation

Episode 131 · · 21:58

πŸ€— Paper Upvotes: 8 | cs.CV Authors: Luchao Qi, Jiaye Wu, Bang Gong, Annie N. Wang, David W. Jacobs, Roni Sengupta Title: ...

Novel View Extrapolation with Video Diffusion Priors

Novel View Extrapolation with Video Diffusion Priors

Episode 130 · · 21:23

πŸ€— Paper Upvotes: 7 | cs.CV Authors: Kunhao Liu, Ling Shao, Shijian Lu Title: Novel View Extrapolation with Video Diffus...

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Episode 129 · · 19:22

πŸ€— Paper Upvotes: 42 | cs.CL, cs.CV Authors: Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lew...

Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

Episode 128 · · 24:16

πŸ€— Paper Upvotes: 23 | cs.CV, cs.LG Authors: Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju...

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Episode 127 · · 19:03

πŸ€— Paper Upvotes: 23 | cs.CL Authors: Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang ...

Hymba: A Hybrid-head Architecture for Small Language Models

Hymba: A Hybrid-head Architecture for Small Language Models

Episode 126 · · 24:24

πŸ€— Paper Upvotes: 20 | cs.CL, cs.AI, cs.LG Authors: Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Sh...

Natural Language Reinforcement Learning

Natural Language Reinforcement Learning

Episode 125 · · 23:57

πŸ€— Paper Upvotes: 15 | cs.LG, cs.AI, cs.CL Authors: Xidong Feng, Ziyu Wan, Haotian Fu, Bo Liu, Mengyue Yang, Girish A. Koushik, Zhiyuan Hu, Ying ...

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Episode 124 · · 23:23

πŸ€— Paper Upvotes: 15 | cs.CL, cs.AI, cs.DL, cs.IR, cs.LG Authors: Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee...

Ultra-Sparse Memory Network

Ultra-Sparse Memory Network

Episode 123 · · 20:22

πŸ€— Paper Upvotes: 14 | cs.LG Authors: Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo, Xun Zhou Title: ...

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Episode 122 · · 24:23

πŸ€— Paper Upvotes: 10 | cs.CV Authors: Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu Title:...

Stable Flow: Vital Layers for Training-Free Image Editing

Stable Flow: Vital Layers for Training-Free Image Editing

Episode 121 · · 22:50

πŸ€— Paper Upvotes: 7 | cs.CV, cs.GR, cs.LG Authors: Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel...

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Episode 120 · · 21:34

πŸ€— Paper Upvotes: 6 | cs.CL, cs.AI, cs.LG Authors: Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, Neel Nanda Title: ...

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Episode 119 · · 23:15

πŸ€— Paper Upvotes: 35 | cs.LG, cs.AI, cs.CV, cs.NE, cs.PF Authors: Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen ...

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Episode 118 · · 24:56

πŸ€— Paper Upvotes: 23 | cs.CV Authors: Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenya...

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Episode 117 · · 24:11

πŸ€— Paper Upvotes: 14 | cs.CV, cs.AI, cs.CL, cs.MM Authors: Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li ...

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Episode 116 · · 22:01

πŸ€— Paper Upvotes: 12 | cs.CV Authors: Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang Title: ...

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Episode 115 · · 21:30

πŸ€— Paper Upvotes: 9 | cs.AI Authors: Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu ...

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Episode 114 · · 24:42

πŸ€— Paper Upvotes: 7 | cs.CL Authors: Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang Title: ...

Stylecodes: Encoding Stylistic Information For Image Generation

Stylecodes: Encoding Stylistic Information For Image Generation

Episode 113 · · 21:26

πŸ€— Paper Upvotes: 6 | cs.CV Authors: Ciara Rowles Title: Stylecodes: Encoding Stylistic Information For Image Generation...

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

Episode 112 · · 23:35

πŸ€— Paper Upvotes: 3 | cs.CV, cs.AI Authors: Vipula Rawte, Sarthak Jain, Aarush Sinha, Garv Kaushik, Aman Bansal, Prathiksha Rumale Vishwanath, Sa...