Learning robot control with reinforcement learning (RL) is often hindered by the sparse reward problem, where the agent receives feedback only upon completing the task. This leads to slow convergence or even failure to learn. We propose Goal Achievement Guided Exploration (GAGE), a method that leverages the goal achievement signal to guide exploration, mitigating premature convergence and improving sample efficiency in learning robot control tasks.
Publication
In Transactions on Machine Learning Research and CoRL 2024 Workshop on Whole-body Control and Bimanual Manipulation
Go2 Beam Climbing
Ant on Ball
Humanoid Ball Dribbling
Humanoid Cartwheel
Humanoid Rope Walking
Highlights
Problem. Robot-control RL is often stuck at the sparse reward setting — the agent only sees a learning signal after the full task is completed, leading to slow convergence or outright failure.
Key observation. Even when extrinsic reward is sparse, intermediate goal-achievement signals (sub-goals being reached, contacts being made, balance being held) carry useful information that standard exploration ignores.
Method — GAGE.Goal Achievement Guided Exploration turns these goal-achievement signals into an exploration bonus that steers the policy away from premature convergence onto trivial behaviors, without changing the underlying RL algorithm.
Result. Across challenging whole-body control tasks — Go2 beam climbing, Ant balancing on a ball, humanoid dribbling, cartwheels, and rope walking (videos above) — GAGE learns where standard sparse-reward RL stalls, and improves sample efficiency on the tasks where baselines do learn.