Episodes

Latest Episode
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Episode 612 · · 22:22

🤗 Upvotes: 32 | cs.AI, cs.CL, cs.CV, cs.MM Authors: Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen Title: ...

Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance

Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance

Episode 611 · · 24:57

🤗 Upvotes: 27 | cs.CL Authors: Xueqing Peng, Triantafillos Papadopoulos, Efstathia Soufleri, Polydoros Giannouris, Ruoyu Xiang, Yan Wang, Lingfe...

Language Models' Factuality Depends on the Language of Inquiry

Language Models' Factuality Depends on the Language of Inquiry

Episode 610 · · 22:25

🤗 Upvotes: 19 | cs.CL, cs.AI Authors: Tushar Aggarwal, Kumar Tanmay, Ayush Agrawal, Kumar Ayush, Hamid Palangi, Paul Pu Liang Title...

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Episode 609 · · 24:04

🤗 Upvotes: 16 | cs.CL Authors: Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhiche...

Towards an AI co-scientist

Towards an AI co-scientist

Episode 608 · · 25:03

🤗 Upvotes: 15 | cs.AI, cs.CL, cs.HC, cs.LG, physics.soc-ph, q-bio.OT Authors: Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Pale...

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

Episode 607 · · 18:21

🤗 Upvotes: 15 | cs.CL, cs.AI Authors: Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi Li Title: A...

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation

Episode 606 · · 22:49

🤗 Upvotes: 13 | cs.LG, cs.SE Authors: Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu ...

Rank1: Test-Time Compute for Reranking in Information Retrieval

Rank1: Test-Time Compute for Reranking in Information Retrieval

Episode 605 · · 19:51

🤗 Upvotes: 11 | cs.IR, cs.CL, cs.LG Authors: Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme ...

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Episode 604 · · 26:26

🤗 Upvotes: 122 | cs.CL, cs.AI, cs.LG Authors: Deepak Nathani, Lovish Madaan, Nicholas Roberts, Nikolay Bashlykov, Ajay Menon, Vincent Moens, Ama...

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Episode 603 · · 25:23

🤗 Upvotes: 82 | cs.CV, cs.AI Authors: Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parth...

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Episode 602 · · 24:06

🤗 Upvotes: 81 | cs.CL Authors: M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, Kang Zhu, Minghao Liu, Yiming Liang, Xia...

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Episode 601 · · 22:43

🤗 Upvotes: 51 | cs.CL Authors: Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, Mikhai...

S*: Test Time Scaling for Code Generation

S*: Test Time Scaling for Code Generation

Episode 600 · · 24:06

🤗 Upvotes: 39 | cs.LG, cs.AI Authors: Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzale...

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Episode 599 · · 24:51

🤗 Upvotes: 24 | cs.CL, cs.AI Authors: Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Ch...

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning

Episode 598 · · 21:29

🤗 Upvotes: 22 | quant-ph, cs.AI, cs.IT, cs.LG, math.IT Authors: Austin Yubo He, Zi-Wen Liu Title: Discovering highly ef...

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

Episode 597 · · 19:43

🤗 Upvotes: 20 | cs.CV, cs.AI, cs.CL Authors: Shangqing Tu, Yucheng Wang, Daniel Zhang-Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zh...

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Episode 596 · · 23:15

🤗 Upvotes: 18 | cs.CL, cs.AI Authors: Yein Park, Chanwoong Yoon, Jungwoo Park, Minbyul Jeong, Jaewoo Kang Title: Does T...

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Episode 595 · · 23:24

🤗 Upvotes: 15 | cs.CL, cs.LG Authors: Ruotian Ma, Peisong Wang, Cheng Liu, Xingyan Liu, Jiaqi Chen, Bang Zhang, Xin Zhou, Nan Du, Jia Li ...

Qwen2.5-VL Technical Report

Qwen2.5-VL Technical Report

Episode 594 · · 21:11

🤗 Upvotes: 97 | cs.CV, cs.CL Authors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, J...

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning

Episode 593 · · 20:45

🤗 Upvotes: 31 | cs.CV, cs.RO Authors: Hao Gao, Shaoyu Chen, Bo Jiang, Bencheng Liao, Yiang Shi, Xiaoyang Guo, Yuechuan Pu, Haoran Yin, Xiangyu L...

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Episode 592 · · 24:25

🤗 Upvotes: 28 | cs.SD, cs.AI Authors: Zihan Liu, Shuangrui Ding, Zhixiong Zhang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jia...

MoM: Linear Sequence Modeling with Mixture-of-Memories

MoM: Linear Sequence Modeling with Mixture-of-Memories

Episode 591 · · 20:12

🤗 Upvotes: 22 | cs.CL, cs.AI, cs.LG Authors: Jusen Du, Weigao Sun, Disen Lan, Jiaxi Hu, Yu Cheng Title: MoM: Linear Seq...

Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

Episode 590 · · 21:25

🤗 Upvotes: 22 | cs.CL Authors: William Jurayj, Jeffrey Cheng, Benjamin Van Durme Title: Is That Your Final Answer? Test...

Craw4LLM: Efficient Web Crawling for LLM Pretraining

Craw4LLM: Efficient Web Crawling for LLM Pretraining

Episode 589 · · 22:44

🤗 Upvotes: 21 | cs.CL Authors: Shi Yu, Zhiyuan Liu, Chenyan Xiong Title: Craw4LLM: Efficient Web Crawling for LLM Pretr...

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Episode 588 · · 25:16

🤗 Upvotes: 19 | cs.CL, cs.LG Authors: Guanzheng Chen, Xin Li, Michael Qizhe Shieh, Lidong Bing Title: LongPO: Long Cont...

Small Models Struggle to Learn from Strong Reasoners

Small Models Struggle to Learn from Strong Reasoners

Episode 587 · · 20:02

🤗 Upvotes: 17 | cs.AI Authors: Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Po...

Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Episode 586 · · 22:30

🤗 Upvotes: 15 | cs.LG, cs.AI, cs.DC Authors: Michael Luo, Xiaoxiang Shi, Colin Cai, Tianjun Zhang, Justin Wong, Yichuan Wang, Chi Wang, Yanping ...

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Episode 585 · · 22:04

🤗 Upvotes: 10 | cs.CL, cs.AI, cs.IR, cs.IT, math.IT Authors: Yucheng Shi, Tianze Yang, Canyu Chen, Quanzheng Li, Tianming Liu, Xiang Li, Ninghao...

Soundwave: Less is More for Speech-Text Alignment in LLMs

Soundwave: Less is More for Speech-Text Alignment in LLMs

Episode 584 · · 21:52

🤗 Upvotes: 65 | cs.CL, cs.AI, cs.SD Authors: Yuhao Zhang, Zhiheng Liu, Fan Bu, Ruiyu Zhang, Benyou Wang, Haizhou Li Title: ...

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Episode 583 · · 23:08

🤗 Upvotes: 51 | cs.CL, cs.LG Authors: Yuri Kuratov, Mikhail Arkhipov, Aydar Bulatov, Mikhail Burtsev Title: Cramming 15...