WholeBodyVLA: Towards Unified Latent VLA for Whole

发布时间:2026-02-05 06:03

'Apology'用'My apologies for the delay'致歉 #生活技巧# #职场沟通技巧# #商务英语口语#

Authors:Haoran Jiang, Jin Chen, Qingwen Bu, Li Chen, Modi Shi, Yanjie Zhang, Delong Li, Chuanzhe Suo, Chuang Wang, Zhihui Peng, Hongyang Li

View PDF HTML (experimental)

Abstract:Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To execute the desired locomotion commands more precisely, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks. Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2512.11047 [cs.RO]   (or arXiv:2512.11047v2 [cs.RO] for this version)   https://doi.org/10.48550/arXiv.2512.11047

arXiv-issued DOI via DataCite

Submission history

From: Jin Chen [view email]
[v1] Thu, 11 Dec 2025 19:07:31 UTC (9,638 KB)
[v2] Mon, 15 Dec 2025 07:46:35 UTC (9,638 KB)

网址:WholeBodyVLA: Towards Unified Latent VLA for Whole https://c.klqsh.com/news/view/333329

相关内容

Towards Humanist Superintelligence
Yoga for Beginners – Easy Yoga Session at Home
Knowledge, attitude and practice of patients towards orthodontic treatment
Orthodontists’ instructions for oral hygiene in patients with removable and fixed orthodontic appliances
5 zero waste influencers
146 Another Word for Hate?
Eve Air Mobility’s eVTOL Airworthiness Criteria Released for Public Consultation
Synonyms for Important
Wealth management for families through different life cycles
50 Best Chicken Recipes for Any Occasion

随便看看