PointPatchRL

Abstract

PointPatchRL is a method for Reinforcement Learning on point clouds that harnesses their 3D structure to extract task-relevant geometric information from the scene and learn complex manipulation tasks purely from rewards.

While images are a convenient format for perceiving the environment for RL, they often complicate extracting important geometric details, especially with varying geometries or deformable objects. In contrast, point clouds naturally represent this geometry and easily integrate positional and color data from multiple camera views. However, while deep learning on point clouds has seen many recent successes, RL on point clouds is under-researched, with usually only the simplest encoder architecture considered in the literature.

We introduce PointPatchRL (PPRL), a method for RL on point clouds that builds on the common paradigm of dividing point clouds into overlapping patches, tokenizing them, and processing the tokens with transformers. PPRL provides significant improvements compared with other point-cloud architectures previously used for RL. We then complement PPRL with masked reconstruction for representation learning and show that our method outperforms strong model-free and model-based baselines on image observations in complex manipulation tasks containing deformable objects and variations in target object geometry.

Why Point Clouds?

A pointcloud of a completely different faucet

Easier to extract task-relevant geometry

Disentangle occluded objects from their occluders

+

The same faucet from a different perspective

A point cloud made from combining both perspectives

Combine multiple camera views

Method

PointPatchRL

PointPatchRL is a powerful point cloud encoder in its own right, due to the well-known patching and tokenizing paradigm. As a simple, drop-in replacement for any other point cloud encoder, PointPatchRL increases sample efficiency compared to commonly-used architectures (like PointNet). No need to add a reconstruction loss to the learning pipeline if not desired!

PointPatchRL + Aux

We can add to the strong baseline provided by PointPatchRL by introducing the auto-regressive masked reconstruction loss used in PointGPT. This results in even greater sample efficiency on complex manipulation tasks with multiple (potentially moving) cameras. PointPatchRL + Aux outperforms both point cloud-based and image-based baselines.

Policy Videos

Our method outperforms point cloud-based and image-based baselines on 6 challenging manipulation tasks. The tasks contain either deformable objects, or geometric variations across a set of rigid objects.

OpenCabinetDrawer env — OpenCabinetDrawer

Diverse Policies

Agents trained with PPRL + Aux adapt to varying geometries, including handle size and orientation, and whether the door opens to the left or right. The policy coordinates the movements of the gripper and the base and generalizes over varying object geometry.

BibTeX

@inproceedings{gyenes2024pointpatchrl,
  title={PointPatch{RL} - Masked Reconstruction Improves Reinforcement Learning on Point Clouds},
  author={Bal{\'a}zs Gyenes and Nikolai Franke and Philipp Becker and Gerhard Neumann},
  booktitle={8th Annual Conference on Robot Learning},
  year={2024},
  url={https://openreview.net/forum?id=3jNEz3kUSl}
}

PointPatchRL - Masked Reconstruction Improves Reinforcement Learning on Point Clouds

CoRL 2024 Spotlight

Abstract

Why Point Clouds?

Method

PointPatchRL

PointPatchRL + Aux

Policy Videos

Diverse Policies

BibTeX