← Previous · All Episodes · Next →
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Episode 1042

Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

· 20:52

|

🤗 Upvotes: 36 | cs.GR, cs.AI, cs.CV, cs.LG

Authors:
Seungyong Lee, Jeong-gi Kwak

Title:
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Arxiv:
http://arxiv.org/abs/2508.04825v1

Abstract:
Virtual try-on aims to synthesize a realistic image of a person wearing a target garment, but accurately modeling garment-body correspondence remains a persistent challenge, especially under pose and appearance variation. In this paper, we propose Voost - a unified and scalable framework that jointly learns virtual try-on and try-off with a single diffusion transformer. By modeling both tasks jointly, Voost enables each garment-person pair to supervise both directions and supports flexible conditioning over generation direction and garment category, enhancing garment-body relational reasoning without task-specific networks, auxiliary losses, or additional labels. In addition, we introduce two inference-time techniques: attention temperature scaling for robustness to resolution or mask variation, and self-corrective sampling that leverages bidirectional consistency between tasks. Extensive experiments demonstrate that Voost achieves state-of-the-art results on both try-on and try-off benchmarks, consistently outperforming strong baselines in alignment accuracy, visual fidelity, and generalization.


Subscribe

Listen to Daily Paper Cast using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts YouTube
← Previous · All Episodes · Next →