Constrained Reinforcement Learning with Smoothed Log Barrier Function

Abstract

In this paper, we propose a Smoothed Log Barrier Function (CSAC-LB) for constrained reinforcement learning. We show that the CSAC-LB is a smooth approximation of the indicator function of the feasible set, and we prove that the optimal policy of the CSAC-LB constrained problem converges to the optimal policy of the original constrained problem. We demonstrate the effectiveness of our method on several safety-critical control tasks.

Publication
In Transactions on Machine Learning Research
CSAC-LB: Smoothed Log Barrier for Constrained RL

Highlights

  • Problem. Standard Lagrangian-based constrained RL is hyperparameter-sensitive and often oscillates between greedy and overly-conservative behavior; pure barrier methods are numerically unstable near the constraint boundary.
  • Idea. We introduce a Smoothed Log Barrier (CSAC-LB) that approximates the indicator of the feasible set while remaining differentiable across the boundary, bridging soft penalty and hard barrier formulations while keeping the optimization numerically stable.
  • Theory. We show CSAC-LB is a smooth approximation of the feasible-set indicator and prove that its optimal policy converges to the optimal policy of the original constrained problem as the smoothing parameter tightens.
  • Practice. On safety-critical continuous control benchmarks, CSAC-LB matches or exceeds prior constrained-RL baselines (Lagrangian-SAC, CPO-style methods) in return while staying within the safety budget, and does so without per-task tuning of the barrier hyperparameter. In two example videos above, I showcase the usage of the algorithm in real robots. We define the negative energy consumption as the reward and using different velocity as constraints. The robot dog manages to learn different gaits under different velocity constraints. I have further extended this algorithm to heating system. In the end, we didn’t include the robot part in the paper due to fault hardware issues.