Episodes

Latest Episode
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Episode 460 · · 20:15

πŸ€— Upvotes: 10 | cs.LG, cs.CL Authors: Benjamin Feuer, Chinmay Hegde Title: WILDCHAT-50M: A Deep Dive Into the Role of S...

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Episode 459 · · 24:32

πŸ€— Upvotes: 10 | cs.CV, cs.AI, cs.CL, cs.LG, cs.RO Authors: Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang ...

o3-mini vs DeepSeek-R1: Which One is Safer?

o3-mini vs DeepSeek-R1: Which One is Safer?

Episode 458 · · 20:01

πŸ€— Upvotes: 6 | cs.SE, cs.AI Authors: Aitor Arrieta, Miriam Ugarte, Pablo Valle, JosΓ© Antonio Parejo, Sergio Segura Title: ...

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Episode 457 · · 21:14

πŸ€— Upvotes: 1 | cs.AI, cs.CL, cs.HC Authors: Faria Huq, Zora Zhiruo Wang, Frank F. Xu, Tianyue Ou, Shuyan Zhou, Jeffrey P. Bigham, Graham Neubig ...

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Episode 456 · · 22:30

πŸ€— Upvotes: 28 | cs.CL Authors: Yubo Wang, Xiang Yue, Wenhu Chen Title: Critique Fine-Tuning: Learning to Critique is Mo...

Atla Selene Mini: A General Purpose Evaluation Model

Atla Selene Mini: A General Purpose Evaluation Model

Episode 455 · · 25:28

πŸ€— Upvotes: 24 | cs.CL, cs.AI Authors: Andrei Alexandru, Antonia Calvi, Henry Broomfield, Jackson Golden, Kyle Dai, Mathias Leys, Maurice Burger,...

Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts

Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts

Episode 454 · · 28:39

πŸ€— Upvotes: 14 | cs.AI, cs.CY, cs.LG Authors: ClΓ©ment Desroches, Martin Chauvin, Louis Ladan, Caroline Vateau, Simon Gosset, Philippe Cordier ...

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation

Episode 453 · · 22:03

πŸ€— Upvotes: 8 | cs.SE, cs.AI Authors: Aitor Arrieta, Miriam Ugarte, Pablo Valle, JosΓ© Antonio Parejo, Sergio Segura Title: ...

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks

Episode 452 · · 22:14

πŸ€— Upvotes: 8 | cs.CV Authors: Hailong Guo, Bohan Zeng, Yiren Song, Wentao Zhang, Chuang Zhang, Jiaming Liu Title: Any2A...

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

Episode 451 · · 21:44

πŸ€— Upvotes: 6 | cs.CR, cs.AI, cs.CL, cs.LG Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu Title: ...

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

Episode 450 · · 19:42

πŸ€— Upvotes: 6 | cs.CL, cs.AI Authors: Jenna Russell, Marzena Karpinska, Mohit Iyyer Title: People who frequently use Cha...

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Episode 449 · · 23:17

πŸ€— Upvotes: 29 | cs.AI, cs.CV, cs.LG Authors: Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, S...

Optimizing Large Language Model Training Using FP4 Quantization

Optimizing Large Language Model Training Using FP4 Quantization

Episode 448 · · 22:09

πŸ€— Upvotes: 15 | cs.LG, cs.CL Authors: Ruizhe Wang, Yeyun Gong, Xiao Liu, Guoshuai Zhao, Ziyue Yang, Baining Guo, Zhengjun Zha, Peng Cheng ...

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Episode 447 · · 23:00

πŸ€— Upvotes: 11 | cs.CV Authors: Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu Title: DiffSplat: Repurpos...

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Episode 446 · · 23:23

πŸ€— Upvotes: 10 | cs.CL, cs.LG Authors: Hongzhi Huang, Defa Zhu, Banggu Wu, Yutao Zeng, Ya Wang, Qiyang Min, Xun Zhou Title: ...

Open Problems in Mechanistic Interpretability

Open Problems in Mechanistic Interpretability

Episode 445 · · 25:48

πŸ€— Upvotes: 10 | cs.LG Authors: Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefa...

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

Episode 444 · · 22:26

πŸ€— Upvotes: 5 | cs.LG, cs.AI, cs.CL Authors: J. Pablo MuΓ±oz, Jinjie Yuan, Nilesh Jain Title: Low-Rank Adapters Meet Neur...

IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

Episode 443 · · 20:02

πŸ€— Upvotes: 4 | cs.CL, cs.AI Authors: Sankalp KJ, Ashutosh Kumar, Laxmaan Balaji, Nikunj Kotecha, Vinija Jain, Aman Chadha, Sreyoshi Bhaduri ...

Histoires Morales: A French Dataset for Assessing Moral Alignment

Histoires Morales: A French Dataset for Assessing Moral Alignment

Episode 442 · · 20:44

πŸ€— Upvotes: 3 | cs.CL, cs.AI Authors: Thibaud Leteno, Irina Proskurina, Antoine Gourru, Julien Velcin, Charlotte Laclau, Guillaume Metzler, Chris...

Qwen2.5-1M Technical Report

Qwen2.5-1M Technical Report

Episode 441 · · 24:17

πŸ€— Upvotes: 26 | cs.CL Authors: An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhan...

ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer

ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer

Episode 440 · · 20:45

πŸ€— Upvotes: 13 | cs.CL Authors: Lin Yueyu, Li Zhiyuan, Peter Yue, Liu Xiao Title: ARWKV: Pretrain is not what we need, a...

Towards General-Purpose Model-Free Reinforcement Learning

Towards General-Purpose Model-Free Reinforcement Learning

Episode 439 · · 20:53

πŸ€— Upvotes: 13 | cs.LG, cs.AI Authors: Scott Fujimoto, Pierluca D'Oro, Amy Zhang, Yuandong Tian, Michael Rabbat Title: T...

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Episode 438 · · 22:02

πŸ€— Upvotes: 11 | cs.SD, cs.CL, eess.AS Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, J...

iFormer: Integrating ConvNet and Transformer for Mobile Application

iFormer: Integrating ConvNet and Transformer for Mobile Application

Episode 437 · · 24:01

πŸ€— Upvotes: 9 | cs.CV, cs.AI Authors: Chuanyang Zheng Title: iFormer: Integrating ConvNet and Transformer for Mobile App...

Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

Episode 436 · · 24:59

πŸ€— Upvotes: 7 | cs.CV, cs.AI, cs.LG, q-bio.NC Authors: Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanze...

CodeMonkeys: Scaling Test-Time Compute for Software Engineering

CodeMonkeys: Scaling Test-Time Compute for Software Engineering

Episode 435 · · 23:04

πŸ€— Upvotes: 5 | cs.LG Authors: Ryan Ehrlich, Bradley Brown, Jordan Juravsky, Ronald Clark, Christopher RΓ©, Azalia Mirhoseini Title: ...

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Episode 434 · · 21:21

πŸ€— Upvotes: 4 | cs.LG, cs.AI Authors: Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin Mohamed Elnouby Ali, Josh Susskind, Vimal Thilak ...

Humanity's Last Exam

Humanity's Last Exam

Episode 433 · · 22:51

πŸ€— Upvotes: 33 | cs.LG, cs.AI, cs.CL Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Sean Shi, Michael Choi, ...

Chain-of-Retrieval Augmented Generation

Chain-of-Retrieval Augmented Generation

Episode 432 · · 23:23

πŸ€— Upvotes: 26 | cs.IR, cs.CL Authors: Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei Title: ...

Redundancy Principles for MLLMs Benchmarks

Redundancy Principles for MLLMs Benchmarks

Episode 431 · · 22:20

πŸ€— Upvotes: 22 | cs.CL, cs.AI Authors: Zicheng Zhang, Xiangyu Zhao, Xinyu Fang, Chunyi Li, Xiaohong Liu, Xiongkuo Min, Haodong Duan, Kai Chen, Gu...