Mujoco
Step-Based Environments
Box Pushing
The box-pushing task presents an advanced environment for reinforcement learning (RL) systems, utilizing the versatile Franka Emika Panda robotic arm, which boasts seven degrees of freedom (DoFs). The objective of this task is to precisely manipulate a box to a specified goal location and orientation.
This environment defines its context space with a goal position constrained within a certain range along the x and y axes and a goal orientation that encompasses the full 360-degree range on the z-axis. The robot’s mission is to achieve positional accuracy within 5 centimeters and an orientation accuracy within 0.5 radians of the specified goal.
The observation space includes the sine and cosine values of the robotic joint angles, their velocities, and quaternion orientations for the end-effector and the box. The action space describes the applied torques for each joint.
A composite reward function serves as the performance metric for the RL system. It accounts for the distance to the goal, the box’s orientation, maintaining a rod within the box, achieving the rod’s desired orientation, and includes penalties for joint position and velocity limit violations, as well as an action cost for energy expenditure.
Variations of this environment are available, differing in reward structures and the optionality of randomizing the box’s initial position. These variations are purposefully designed to challenge RL algorithms, enhancing their generalization and adaptation capabilities. Temporally sparse environments only provide a reward at the last timestep. Spatially sparse environments only provide a reward, if the goal is almost reached, the box is close enought to the goal and somewhat correctly aligned.
Name |
Description |
Horizon |
Action Dimension |
Observation Dimension |
---|---|---|---|---|
|
Custom Box-pushing task with dense rewards |
100 |
3 |
13 |
|
Custom Box-pushing task with temporally sparse rewards |
100 |
3 |
13 |
|
Custom Box-pushing task with temporally and spatially sparse rewards |
100 |
3 |
13 |
Table Tennis
The table tennis task offers a robotic arm equipped with seven degrees of freedom (DoFs). The task is to respond to incoming balls and return them accurately to a specified goal location on the opponent’s side of the table.
The context space for this environment includes the initial ball position, with x-coordinates ranging from -1 to -0.2 meters and y-coordinates from -0.65 to 0.65 meters, and the goal position with x-coordinates between -1.2 to -0.2 meters and y-coordinates from -0.6 to 0.6 meters. The full observation space comprises the sine and cosine values of the joint angles, the joint velocities, and the ball’s velocity, providing comprehensive information for the RL system to base its decisions on.
A task is considered successfully completed when the returned ball not only lands on the opponent’s side of the table but also within a tight margin of 20 centimeters from the goal location. The reward function is designed to reflect various conditions of play, including whether the ball was hit, if it landed on the table, and the proximity of the ball’s landing position to the goal location.
Variations of the table tennis environment are available to cater to different research needs. These variations maintain the foundational challenge of precise ball return while providing additional complexity for RL algorithms to overcome.
Name |
Description |
Horizon |
Action Dimension |
Observation Dimension |
---|---|---|---|---|
|
Table Tennis task with 2D context, based on a custom environment for table tennis |
350 |
7 |
19 |
|
Table Tennis task with 2D context and replanning, based on a custom environment for table tennis |
350 |
7 |
19 |
|
Table Tennis task with 4D context, based on a custom environment for table tennis |
350 |
7 |
22 |
|
Table Tennis task with 4D context and replanning, based on a custom environment for table tennis |
350 |
7 |
22 |
|
Table Tennis task with wind effects, based on a custom environment for table tennis |
350 |
7 |
19 |
|
Table Tennis task with goal switching, based on a custom environment for table tennis |
350 |
7 |
19 |
|
Table Tennis task with wind effects and replanning, based on a custom environment for table tennis |
350 |
7 |
19 |
Beer Pong
The Beer Pong task is based upon a robotic system with seven Degrees of Freedom (DoF), challenging the robot to throw a ball into a cup placed on a large table. The environment’s context is established by the cup’s location, defined within a range of x-coordinates from -1.42 to 1.42 meters and y-coordinates from -4.05 to -1.25 meters.
The observation space includes the cosine and sine of the robot’s joint angles, the angular velocities, and distances of the ball relative to the top and bottom of the cup, along with the cup’s position and the current timestep. The action space for the robot is defined by the torques applied to each joint. For episode-based methods, the parameter space is expanded to 15 dimensions, which includes two weights for the basis functions per joint and the duration of the throw, namely the ball release time.
Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball’s position relative to the cup and the action penalty is considered.
An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward.
A successful throw in this task is determined by the ball landing in the cup at the episode’s conclusion, showcasing the robot’s ability to accurately predict and execute the complex motion required for this popular party game.
Name |
Description |
Horizon |
Action Dimension |
Observation Dimension |
---|---|---|---|---|
|
Beer Pong task, based on a custom environment with multiple task variations |
300 |
3 |
29 |
|
Step-based rewards for the Beer Pong task, based on a custom environment with episodic rewards |
300 |
3 |
29 |
|
Beer Pong with fixed release, based on a custom environment with episodic rewards |
300 |
3 |
29 |
Variations of existing environments
Name |
Description |
Horizon |
Action Dimension |
Observation Dimension |
---|---|---|---|---|
|
Modified (5 links) gymnasiums’s mujoco |
200 |
5 |
21 |
|
Same as |
200 |
5 |
21 |
|
Modified (7 links) gymnasiums’s mujoco |
200 |
7 |
27 |
|
Same as |
200 |
7 |
27 |
|
Reacher task with 5 links, based on Gymnasium’s |
200 |
5 |
20 |
|
Sparse Reacher task with 5 links, based on Gymnasium’s |
200 |
5 |
20 |
|
Reacher task with 7 links, based on Gymnasium’s |
200 |
7 |
22 |
|
Sparse Reacher task with 7 links, based on Gymnasium’s |
200 |
7 |
22 |
|
Hopper Jump task with sparse rewards, based on Gymnasium’s |
250 |
3 |
15 / 16* |
|
Hopper Jump task with continuous rewards, based on Gymnasium’s |
250 |
3 |
15 / 16* |
|
Ant Jump task, based on Gymnasium’s |
200 |
8 |
119 |
|
HalfCheetah Jump task, based on Gymnasium’s |
100 |
6 |
112 |
|
Hopper Jump on Box task, based on Gymnasium’s |
250 |
4 |
16 / 100* |
|
Hopper Throw task, based on Gymnasium’s |
250 |
3 |
18 / 100* |
|
Hopper Throw in Basket task, based on Gymnasium’s |
250 |
3 |
18 / 100* |
|
Walker 2D Jump task, based on Gymnasium’s |
300 |
6 |
18 / 19* |
*Observation dimensions depend on configuration.
MP Environments
Most of these envs also exist as MP-variants. Refer to them using fancy_DMP/<name>
fancy_ProMP/<name>
or fancy_ProDMP/<name>
.