Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots

1York University

Abstract

In this research, we introduce a deep reinforcement learning-based control approach to address the intricate challenge of the robotic pre-grasping phase under microgravity conditions. Leveraging reinforcement learning eliminates the necessity for manual feature design, therefore simplifying the problem and empowering the robot to learn pre-grasping policies through trial and error.

Our methodology incorporates an off-policy reinforcement learning framework, employing the soft actor-critic technique to enable the gripper to proficiently approach a free-floating moving object, ensuring optimal pre-grasp success. For effective learning of the pre-grasping approach task, we developed a reward function that offers the agent clear and insightful feedback. Our case study examines a pre-grasping task where a Robotiq 3F gripper is required to navigate towards a free-floating moving target, pursue it, and subsequently position itself at the desired pre-grasp location.

We assessed our approach through a series of experiments in both simulated and real-world environments.

Lab Setup

Short description of the image

To test and implement our approach, we used a Fanuc LR Mate 20iD/25 robot arm with a Robotiq 3F gripper. The robot arm was equipped with a camera to detect the target object and a force-torque sensor to provide feedback to simulate zero gravity conditions.

Real World Implementation

In both scenarios, a randomly positioned floating target is approached by a Fanuc robot arm with a Robotiq 3F gripper. Equipped with a camera and using the pre-trained YOLOX algorithm, the robot accurately identifies the target’s grasping section and moves to the pre-grasp position.

......................................................

All the required actions to place the gripper in pre-grasp position is predicted from the trained agent, receiving all the necessary observations from sensors like camera and tactile sensors. For Safety reasons, the target is placed randomly in stationary position.

Reward Shaping

Four novel reward function components designed to drive the robot toward the target, follow it, and reach the pre-grasp position. Each component, normalized between 0 and 1, helps isolate its impact on the robot's behavior during training.

Interpolate start reference image.

position correction

Interpolation end reference image.

orientation correction

Interpolation end reference image.

topology encouragement

Interpolation end reference image.

contact penalization


BibTeX

@INPROCEEDINGS{10610017,
      author={Beigomi, Bahador and Zhu, Zheng H.},
      booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)}, 
      title={Towards Real-World Efficiency: Domain Randomization in Reinforcement Learning for Pre-Capture of Free-Floating Moving Targets by Autonomous Robots}, 
      year={2024},
      volume={},
      number={},
      pages={11753-11759},
      keywords={Training;Target tracking;Source coding;Reinforcement learning;Robustness;Sensors;Grippers},
      doi={10.1109/ICRA57147.2024.10610017}}