Episodes

Latest Episode
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Episode 148 · · 27:22

πŸ€— Paper Upvotes: 28 | cs.CV Authors: Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon Title: Large-Scale Text-to...

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Episode 147 · · 21:57

πŸ€— Paper Upvotes: 19 | cs.AI, cs.CL Authors: Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattach...

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Episode 146 · · 21:01

πŸ€— Paper Upvotes: 18 | cs.CL, cs.AI Authors: Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, W...

MH-MoE: Multi-Head Mixture-of-Experts

MH-MoE: Multi-Head Mixture-of-Experts

Episode 145 · · 21:00

πŸ€— Paper Upvotes: 17 | cs.CL Authors: Shaohan Huang, Xun Wu, Shuming Ma, Furu Wei Title: MH-MoE: Multi-Head Mixture-of-E...

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Episode 144 · · 21:09

πŸ€— Paper Upvotes: 15 | cs.CV Authors: Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu...

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Episode 143 · · 22:42

πŸ€— Paper Upvotes: 13 | cs.CV, cs.AI, cs.CL Authors: Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal Title: Dream...

Knowledge Transfer Across Modalities with Natural Language Supervision

Knowledge Transfer Across Modalities with Natural Language Supervision

Episode 142 · · 20:38

πŸ€— Paper Upvotes: 13 | cs.CV, 68T45 (Primary) 68T50 (Secondary), I.2.6 Authors: Carlo Alberto Barbano, Luca Molinaro, Emanuele Aiello, Marco Gran...

One Diffusion to Generate Them All

One Diffusion to Generate Them All

Episode 141 · · 23:25

πŸ€— Paper Upvotes: 13 | cs.CV, cs.AI Authors: Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Kri...

VisualLens: Personalization through Visual History

VisualLens: Personalization through Visual History

Episode 140 · · 24:45

πŸ€— Paper Upvotes: 13 | cs.CV Authors: Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue ...

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Episode 139 · · 26:23

πŸ€— Paper Upvotes: 38 | cs.CL Authors: Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester Jame...

Style-Friendly SNR Sampler for Style-Driven Generation

Style-Friendly SNR Sampler for Style-Driven Generation

Episode 138 · · 20:05

πŸ€— Paper Upvotes: 28 | cs.CV Authors: Jooyoung Choi, Chaehun Shin, Yeongtak Oh, Heeseung Kim, Sungroh Yoon Title: Style-...

OminiControl: Minimal and Universal Control for Diffusion Transformer

OminiControl: Minimal and Universal Control for Diffusion Transformer

Episode 137 · · 26:03

πŸ€— Paper Upvotes: 22 | cs.CV, cs.AI, cs.LG Authors: Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang Title: ...

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

Episode 136 · · 23:35

πŸ€— Paper Upvotes: 15 | cs.CL, cs.LG, 68T50, I.2.7 Authors: Gabriel Chua, Shing Yee Chan, Shaun Khoo Title: A Flexible La...

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Episode 135 · · 27:13

πŸ€— Paper Upvotes: 14 | cs.AI Authors: Davide Paglieri, BartΕ‚omiej CupiaΕ‚, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pi...

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Episode 134 · · 22:56

πŸ€— Paper Upvotes: 12 | cs.CV, cs.CL Authors: Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu Title: Large Multi-modal Models...

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Episode 133 · · 21:27

πŸ€— Paper Upvotes: 9 | cs.CV, cs.AI, cs.CL Authors: Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue...

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction

Episode 132 · · 26:01

πŸ€— Paper Upvotes: 9 | cs.CV, cs.AI, cs.LG Authors: Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo Title: ...

MyTimeMachine: Personalized Facial Age Transformation

MyTimeMachine: Personalized Facial Age Transformation

Episode 131 · · 21:58

πŸ€— Paper Upvotes: 8 | cs.CV Authors: Luchao Qi, Jiaye Wu, Bang Gong, Annie N. Wang, David W. Jacobs, Roni Sengupta Title: ...

Novel View Extrapolation with Video Diffusion Priors

Novel View Extrapolation with Video Diffusion Priors

Episode 130 · · 21:23

πŸ€— Paper Upvotes: 7 | cs.CV Authors: Kunhao Liu, Ling Shao, Shijian Lu Title: Novel View Extrapolation with Video Diffus...

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Episode 129 · · 19:22

πŸ€— Paper Upvotes: 42 | cs.CL, cs.CV Authors: Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lew...

Multimodal Autoregressive Pre-training of Large Vision Encoders

Multimodal Autoregressive Pre-training of Large Vision Encoders

Episode 128 · · 24:16

πŸ€— Paper Upvotes: 23 | cs.CV, cs.LG Authors: Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju...

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

Episode 127 · · 19:03

πŸ€— Paper Upvotes: 23 | cs.CL Authors: Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang ...

Hymba: A Hybrid-head Architecture for Small Language Models

Hymba: A Hybrid-head Architecture for Small Language Models

Episode 126 · · 24:24

πŸ€— Paper Upvotes: 20 | cs.CL, cs.AI, cs.LG Authors: Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Sh...

Natural Language Reinforcement Learning

Natural Language Reinforcement Learning

Episode 125 · · 23:57

πŸ€— Paper Upvotes: 15 | cs.LG, cs.AI, cs.CL Authors: Xidong Feng, Ziyu Wan, Haotian Fu, Bo Liu, Mengyue Yang, Girish A. Koushik, Zhiyuan Hu, Ying ...

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Episode 124 · · 23:23

πŸ€— Paper Upvotes: 15 | cs.CL, cs.AI, cs.DL, cs.IR, cs.LG Authors: Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee...

Ultra-Sparse Memory Network

Ultra-Sparse Memory Network

Episode 123 · · 20:22

πŸ€— Paper Upvotes: 14 | cs.LG Authors: Zihao Huang, Qiyang Min, Hongzhi Huang, Defa Zhu, Yutao Zeng, Ran Guo, Xun Zhou Title: ...

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Episode 122 · · 24:23

πŸ€— Paper Upvotes: 10 | cs.CV Authors: Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu Title:...

Stable Flow: Vital Layers for Training-Free Image Editing

Stable Flow: Vital Layers for Training-Free Image Editing

Episode 121 · · 22:50

πŸ€— Paper Upvotes: 7 | cs.CV, cs.GR, cs.LG Authors: Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel...

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Episode 120 · · 21:34

πŸ€— Paper Upvotes: 6 | cs.CL, cs.AI, cs.LG Authors: Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan, Neel Nanda Title: ...

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Episode 119 · · 23:15

πŸ€— Paper Upvotes: 35 | cs.LG, cs.AI, cs.CV, cs.NE, cs.PF Authors: Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen ...