PointPatchRL is a method for Reinforcement Learning on point clouds that harnesses their 3D structure to
extract task-relevant geometric information from the scene and learn complex manipulation tasks purely
from rewards.
While images are a convenient format for perceiving the environment for RL, they often complicate
extracting important geometric details, especially with varying geometries or deformable
objects.
In contrast, point clouds naturally represent this geometry and easily integrate positional and
color data from multiple camera views.
However, while deep learning on point clouds has seen many recent successes, RL on point clouds is
under-researched, with usually only the simplest encoder architecture considered in the literature.
We introduce PointPatchRL (PPRL), a method for RL on point clouds that builds on the common
paradigm
of dividing point clouds into overlapping patches, tokenizing them, and processing the
tokens
with transformers.
PPRL provides significant improvements compared with other point-cloud architectures previously used for
RL.
We then complement PPRL with masked reconstruction for representation learning and show that our
method outperforms strong model-free and model-based baselines on image observations in complex
manipulation
tasks containing deformable objects and variations in target object geometry.