Figure 2: (a) High-probability regions of the individual per-expert context distributions, where a color represents an expert $o$. (b) Number of active experts for context regions.
The maximum entropy-based objective allows learning diverse skills to the same or similar tasks defined by the contexts. For this, the per-expert context distributions need to specialize in a sub region of the context context space (a), but at the same time overlapping regions are necessary to learn diverse skills to similar tasks (b). These properties are ensured by the decomposed objective (please see the paper).
The 5-Link reacher task is an extension of the classical 2-Link reacher task from OpenAI Gym. The reacher has to reach a goal position with its tip within all quadrants in the context space. A significant challenge in this task is the time-sparse reward that provides only a reward signal at the end of the episode.
The following video shows diverse reaching skills learned by Di-SkilL. The skills were sampled during inference time from the gating distribution.
In the Box Pushing with Obstacle task a 7-DoF robot is tasked to push a box to a target position and rotation while avoiding an obstacle. The 5-dimensional context consists of the box's target position, orientation and the obstacle's position. The task is additionally challenging due to the time-sparse reward structure.
The following video shows diverse reaching skills learned by Di-SkilL. The skills were sampled during inference time from the gating distribution.
The Hopper from OpenAI Gym is tasked to jump as high as possible while landing in a goal position as marked by the green and red dots. This task has a non-markovian reward structure which makes learning skills with step-based approaches infeasible.
The following videos show the behaviors of Di-SkilL's individual experts. We have sampled contexts from each per-expert context distribution and exectued the corresponding expert. The goal of the videos is to show that each expert is learning different skills. This first expert (left) builds momentum for the jump by using the first joint and stabilizes by landing on its foot. This second expert (middle) builds momentum for the jump by using the first joint and stabilizes by landing on the hopper's "head". The expert is responsible for landing positions that are further away from the initial position. This third expert (right) builds momentum for the jump by using the first joint and stabilizes by landing on the hopper's "head". The expert is responsible for landing positions that are next to the initial position.
In the table tennis task a 7-degree of freedom (DoF) robot has to learn fast and precise motions to smash the ball to a desired position on the opponent's side. The 5-dimensional. context consists of the incoming ball's landing position, the desired landing position on the opponent's side and the ball's initial velocity. The table tennis environment requires good exploratory behavior and has a non-makrovian reward structure making step-based approaches infeasible to learn usefull skills.
The videos blow show diverse striking skills learned by Di-SkilL. For each of the videos the ball's landing position on the opponent's side is fixed and the ball's inital landing position and velocity are varied. The shown skills correspond to executing the experts sampled from the gatinng distribution during inference.
The 7-DoF robot is tasked to hit the ball in an environment with two obstacles, where the blue obstacle is static and the green is reset in each episode. The ball has to pass the tight goal on the other side of the table to achieve a success. This environment has a non-markovian reward structure which makes learning difficult.
The following video shows diverse skills where the goal is fixed and the ball's and the obstacle's initial positions are varied. The experts are sampled from the gating distribution during inference.
@inproceedings{
celik2024acquiring,
title={Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts},
author={Onur Celik and Aleksandar Taranovic and Gerhard Neumann},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=9ZkUFSwlUH}
}