DIME: Diffusion-Based Maximum Entropy Reinforcement Learning

1Autonomous Learning Robots (ALR), Karlsruhe Institute of Technology (KIT) 2Interactive Robot Perception & Learning (PEARL), TU Darmstadt 3Intelligent Autonomous Systems Group (IAS), TU Darmstadt 4Hessian.AI 5German Research Center for AI (DFKI) 6Centre for Cognitive Science, TU Darmstadt
This paper was published at ICML 2025 .

Abstract

Maximum entropy reinforcement learning (MaxEnt-RL) has become the standard approach to RL due to its beneficial exploration properties. Traditionally, policies are parameterized using Gaussian distributions, which significantly limits their representational capacity. Diffusion-based policies offer a more expressive alternative, yet integrating them into MaxEnt-RL poses challenges—primarily due to the intractability of computing their marginal entropy. To overcome this, we propose Diffusion-Based Maximum Entropy RL (DIME). DIME leverages recent advances in approximate inference with diffusion models to derive a lower bound on the maximum entropy objective. Additionally, we propose a policy iteration scheme that provably converges to the optimal diffusion policy. Our method enables the use of expressive diffusion-based policies while retaining the principled exploration benefits of MaxEnt-RL, significantly outperforming other diffusion-based methods on challenging high-dimensional control benchmarks. It is also competitive with state-of-the-art non-diffusion based RL methods while requiring fewer algorithmic design choices and smaller update-to-data ratios, reducing computational complexity.

Content will be updated soon. Stay Tuned!

BibTeX

@article{celik2025dime,
  title={DIME: Diffusion-Based Maximum Entropy Reinforcement Learning},
  author={Celik, Onur and Li, Zechu and Blessing, Denis and Li, Ge and Palanicek, Daniel and Peters, Jan and Chalvatzaki, Georgia and Neumann, Gerhard},
  journal={arXiv preprint arXiv:2502.02316},
  year={2025}
}