What is Episodic RL?

Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).

Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment’s requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.

While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode’s onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.