Curriculum-based Reinforcement Learning to Pre-Grasp Floating Target _ IEEE Transaction on Robotics (TRO) _ International Conference on Robotics and Automation (ICRA) 2024, Yokohama, Japan, Space Robotics, Grasping, Reinforcement Learning, Robotiq 3F Gripper, Universal Robots UR-10 UR-5, Camera, Force-Torque Sensor, Zero Gravity, Microgravity, Soft Actor-Critic, Off-Policy, Reward Function, Pre-Grasping, Approach Task, Real-World, Simulated, Experiments, Bahador Beigomi, Zheng H. Zhu, York University, Toronto, Canada, ORCID

Abstract

Pre-positioning a robotic gripper relative to a free-floating, moving target in 6-DoF poses significant challenges, especially when both translation and rotation are involved. Precise pre-positioning is critical for successful grasping, as deviations from the optimal pose may result in missed grasp opportunities or collisions.

This work addresses these challenges by leveraging reinforcement learning with curriculum learning to enhance both the efficiency and adaptability of the robotic agent in dynamic environments. Curriculum learning is used to incrementally increase task complexity, enabling the agent to adapt to unforeseen conditions while maintaining precise control. The proposed approach uses a Soft Actor-Critic algorithm with deterministic policy output, first trained in the PyBullet simulation environment with domain randomization to mitigate sim-to-real transfer issues. The trained policy is then transferred to a physical robot to perform dynamic pre-positioning tasks.

Our results demonstrate that this method enables the robotic gripper to accurately track and maintain the correct pose relative to a 6-DoF moving and rotating target, ensuring collision-free pre-positioning, a crucial precursor to grasping. More detailed results and videos are available at GitHub.

Simulation Scenarios

To train our agent in a simulated environment, we used PyBullet, a physics engine that provides a realistic simulation of the robot and the micro-gravity environment.

stationary Target - Example 1

Linear motion - Example 2

Linear motion stopped at the end - Example 3

Tilted target wo ang vel - Example 4

Tilted target with ang vel - Example 5

Tilted target with ang vel - Example 6

Lab Setup

To test and implement our approach, we used Zero-G lab facilities in Luxembourg. Equipped with two Universal Robots UR10e robotic arms mounted on Cobotracks rails, the system extends the workspace of the robotic arms for emulating spacecraft motion, Zero-G Lab created a established and unique hardware-in-the-loop (HIL) setup to implement and test the trained agent in real-world setups. The system is equipped with a Robotiq 3F gripper, optitrack cameras, and a well-calibrated motion control system completely built on Robotics Operating System (ROS) and Python. The system is capable of simulating microgravity conditions.

Curriculum Complexity Levels

Standard reinforcement learning algorithms often struggle to learn complex tasks directly from high-dimensional observations. That is why we use curriculum learning to incrementally increase the task complexity and help the agent adapt to unforeseen conditions while maintaining precise control over the gripper's motion. Four hierarchical complexity levels are designed to train the agent in a simulated environment to cope with tilted rotating target. The agent is trained with a curriculum learning approach to incrementally increase the task complexity. The task difficulty at each level is parameterized by the target’s orientation \( \psi \) and angular velocity \( \omega \), both of which increase exponentially across the four levels.

\( L1 \)

\( L2 \)

\( L3 \)

\( L4 \)

Off-Script Scenarios

Since the agent is trained in a simulated environment with domain randomization, it is expected to fail in some real-world scenarios. As shown in the following videos, the agent occasionally fails to predict the correct actions needed to place the gripper in the pre-grasp position, but it still manages to get in contact with the target object.

..........................................

The agent might be able to recover from the failure, and continue the task, like in simulation environment, but because of the safety reasons, the task is stopped right after the contact happens. The main reason for the failure is the agent is not trained with the exact same conditions as the real-world.

Reward Shaping

Four novel reward function components designed to drive the robot toward the target, follow it, and reach the pre-grasp position. Each component, normalized between 0 and 1, helps isolate its impact on the robot's behavior during training.

position correction

orientation correction

topology encouragement

contact penalization

BibTeX

@INPROCEEDINGS{10610017,