Episodes

Latest Episode
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Episode 118 · · 24:56

🤗 Paper Upvotes: 23 | cs.CV Authors: Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenya...

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Episode 117 · · 24:11

🤗 Paper Upvotes: 14 | cs.CV, cs.AI, cs.CL, cs.MM Authors: Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li ...

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Episode 116 · · 22:01

🤗 Paper Upvotes: 12 | cs.CV Authors: Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang Title: ...

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Episode 115 · · 21:30

🤗 Paper Upvotes: 9 | cs.AI Authors: Yu Gu, Boyuan Zheng, Boyu Gou, Kai Zhang, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu ...

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Episode 114 · · 24:42

🤗 Paper Upvotes: 7 | cs.CL Authors: Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang Title: ...

Stylecodes: Encoding Stylistic Information For Image Generation

Stylecodes: Encoding Stylistic Information For Image Generation

Episode 113 · · 21:26

🤗 Paper Upvotes: 6 | cs.CV Authors: Ciara Rowles Title: Stylecodes: Encoding Stylistic Information For Image Generation...

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

Episode 112 · · 23:35

🤗 Paper Upvotes: 3 | cs.CV, cs.AI Authors: Vipula Rawte, Sarthak Jain, Aarush Sinha, Garv Kaushik, Aman Bansal, Prathiksha Rumale Vishwanath, Sa...

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Loss-to-Loss Prediction: Scaling Laws for All Datasets

Episode 111 · · 21:35

🤗 Paper Upvotes: 2 | cs.LG, cs.AI, cs.CL, stat.ML Authors: David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach, Sham Kakade ...

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

Episode 110 · · 20:19

🤗 Paper Upvotes: 2 | cs.CV Authors: Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai Title: ...

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

Episode 109 · · 25:22

🤗 Paper Upvotes: 13 | cs.CV Authors: Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang ...

Continuous Speculative Decoding for Autoregressive Image Generation

Continuous Speculative Decoding for Autoregressive Image Generation

Episode 108 · · 22:37

🤗 Paper Upvotes: 13 | cs.CV Authors: Zili Wang, Robert Zhang, Kun Ding, Qi Yang, Fei Li, Shiming Xiang Title: Continuou...

ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

Episode 107 · · 19:16

🤗 Paper Upvotes: 11 | cs.CV Authors: M. Arda Aydın, Efe Mert Çırpar, Elvin Abdinli, Gozde Unal, Yusuf H. Sahin Title: I...

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations

Episode 106 · · 25:42

🤗 Paper Upvotes: 10 | cs.GR, cs.CV Authors: Hmrishav Bandyopadhyay, Yi-Zhe Song Title: FlipSketch: Flipping Static Draw...

Soft Robotic Dynamic In-Hand Pen Spinning

Soft Robotic Dynamic In-Hand Pen Spinning

Episode 105 · · 22:33

🤗 Paper Upvotes: 8 | cs.RO Authors: Yunchao Yao, Uksang Yoo, Jean Oh, Christopher G. Atkeson, Jeffrey Ichnowski Title: ...

Building Trust: Foundations of Security, Safety and Transparency in AI

Building Trust: Foundations of Security, Safety and Transparency in AI

Episode 104 · · 22:10

🤗 Paper Upvotes: 8 | cs.CY, cs.AI, cs.CL Authors: Huzaifa Sidhpurwala, Garth Mollett, Emily Fox, Mark Bestavros, Huamin Chen Title:...

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Episode 103 · · 20:36

🤗 Paper Upvotes: 5 | cs.CV Authors: Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing...

Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages

Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages

Episode 102 · · 24:12

🤗 Paper Upvotes: 3 | cs.CL, cs.AI Authors: S. Tamang, D. J. Bora Title: Evaluating Tokenizer Performance of Large Langu...

Generative World Explorer

Generative World Explorer

Episode 101 · · 21:35

🤗 Paper Upvotes: 38 | cs.CV Authors: Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi, Jieneng Chen Title: Generat...

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Episode 100 · · 19:51

🤗 Paper Upvotes: 31 | cs.CV, cs.CL Authors: Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu...

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Episode 99 · · 21:08

🤗 Paper Upvotes: 13 | cs.AI, cs.CL, stat.ML Authors: Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen ...

AnimateAnything: Consistent and Controllable Animation for Video Generation

AnimateAnything: Consistent and Controllable Animation for Video Generation

Episode 98 · · 22:15

🤗 Paper Upvotes: 12 | cs.CV Authors: Guojun Lei, Chi Wang, Hong Li, Rong Zhang, Yikai Wang, Weiwei Xu Title: AnimateAny...

Top-$nσ$: Not All Logits Are You Need

Top-$nσ$: Not All Logits Are You Need

Episode 97 · · 21:18

🤗 Paper Upvotes: 12 | cs.LG Authors: Chenxia Tang, Jianchun Liu, Hongli Xu, Liusheng Huang Title: Top-$nσ$: Not All Log...

Drowning in Documents: Consequences of Scaling Reranker Inference

Drowning in Documents: Consequences of Scaling Reranker Inference

Episode 96 · · 21:41

🤗 Paper Upvotes: 10 | cs.IR, cs.CL, cs.LG Authors: Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, Andrew Drozdov ...

SlimLM: An Efficient Small Language Model for On-Device Document Assistance

SlimLM: An Efficient Small Language Model for On-Device Document Assistance

Episode 95 · · 25:53

🤗 Paper Upvotes: 10 | cs.CL Authors: Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui Tit...

Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Episode 94 · · 19:48

🤗 Paper Upvotes: 8 | cs.CV Authors: Jinqiang Long, Yanqi Dai, Guoxing Yang, Hongpeng Lin, Nanyi Fei, Yizhao Gao, Zhiwu Lu Title: ...

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers

Episode 93 · · 27:31

🤗 Paper Upvotes: 8 | cs.LG Authors: Joseph Liu, Joshua Geddes, Ziyu Guo, Haomiao Jiang, Mahesh Kumar Nandwana Title: Sm...

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch

LLäMmlein: Compact and Competitive German-Only Language Models from Scratch

Episode 92 · · 22:13

🤗 Paper Upvotes: 7 | cs.CL, cs.AI, cs.LG Authors: Jan Pfister, Julia Wunderle, Andreas Hotho Title: LLäMmlein: Compact ...

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Episode 91 · · 25:35

🤗 Paper Upvotes: 64 | cs.CV Authors: Guowei Xu, Peng Jin, Li Hao, Yibing Song, Lichao Sun, Li Yuan Title: LLaVA-o1: Let...

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Episode 90 · · 24:29

🤗 Paper Upvotes: 19 | cs.CV, cs.AI, cs.GR Authors: Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen...

Xmodel-1.5: An 1B-scale Multilingual LLM

Xmodel-1.5: An 1B-scale Multilingual LLM

Episode 89 · · 20:52

🤗 Paper Upvotes: 7 | cs.CL Authors: Wang Qun, Liu Yang, Lin Qingquan, Jiang Ling Title: Xmodel-1.5: An 1B-scale Multili...