Curriculum-based Reinforcement Learning to Pre-Grasp Floating Target

1York University 2University of Luxembourg

Several test have been conducted to evaluate the performance of the proposed approach in real-world scenarios. In the following videos, the robot arm approaches a floating target and moves to the pre-grasp position. The target is moving and rotating randomly in the space. All of the required actions to place the gripper in the pre-grasp position are predicted from the trained agent, receiving all the necessary observations from sensors like optitrack.

Abstract

Pre-positioning a robotic gripper relative to a free-floating, moving target in 6-DoF poses significant challenges, especially when both translation and rotation are involved. Precise pre-positioning is critical for successful grasping, as deviations from the optimal pose may result in missed grasp opportunities or collisions.

This work addresses these challenges by leveraging reinforcement learning with curriculum learning to enhance both the efficiency and adaptability of the robotic agent in dynamic environments. Curriculum learning is used to incrementally increase task complexity, enabling the agent to adapt to unforeseen conditions while maintaining precise control. The proposed approach uses a Soft Actor-Critic algorithm with deterministic policy output, first trained in the PyBullet simulation environment with domain randomization to mitigate sim-to-real transfer issues. The trained policy is then transferred to a physical robot to perform dynamic pre-positioning tasks.

Our results demonstrate that this method enables the robotic gripper to accurately track and maintain the correct pose relative to a 6-DoF moving and rotating target, ensuring collision-free pre-positioning, a crucial precursor to grasping. More detailed results and videos are available at GitHub.

Simulation Scenarios

To train our agent in a simulated environment, we used PyBullet, a physics engine that provides a realistic simulation of the robot and the micro-gravity environment.

Lab Setup

Short description of the image

To test and implement our approach, we used Zero-G lab facilities in Luxembourg. Equipped with two Universal Robots UR10e robotic arms mounted on Cobotracks rails, the system extends the workspace of the robotic arms for emulating spacecraft motion, Zero-G Lab created a established and unique hardware-in-the-loop (HIL) setup to implement and test the trained agent in real-world setups. The system is equipped with a Robotiq 3F gripper, optitrack cameras, and a well-calibrated motion control system completely built on Robotics Operating System (ROS) and Python. The system is capable of simulating microgravity conditions.

Curriculum Complexity Levels

Standard reinforcement learning algorithms often struggle to learn complex tasks directly from high-dimensional observations. That is why we use curriculum learning to incrementally increase the task complexity and help the agent adapt to unforeseen conditions while maintaining precise control over the gripper's motion. Four hierarchical complexity levels are designed to train the agent in a simulated environment to cope with tilted rotating target. The agent is trained with a curriculum learning approach to incrementally increase the task complexity. The task difficulty at each level is parameterized by the target’s orientation \( \psi \) and angular velocity \( \omega \), both of which increase exponentially across the four levels.

\( L1 \)

\( L2 \)

\( L3 \)

\( L4 \)


Off-Script Scenarios

Since the agent is trained in a simulated environment with domain randomization, it is expected to fail in some real-world scenarios. As shown in the following videos, the agent occasionally fails to predict the correct actions needed to place the gripper in the pre-grasp position, but it still manages to get in contact with the target object.

..........................................

The agent might be able to recover from the failure, and continue the task, like in simulation environment, but because of the safety reasons, the task is stopped right after the contact happens. The main reason for the failure is the agent is not trained with the exact same conditions as the real-world.

Reward Shaping

Four novel reward function components designed to drive the robot toward the target, follow it, and reach the pre-grasp position. Each component, normalized between 0 and 1, helps isolate its impact on the robot's behavior during training.

Interpolate start reference image.

position correction

Interpolation end reference image.

orientation correction

Interpolation end reference image.

topology encouragement

Interpolation end reference image.

contact penalization


BibTeX

@INPROCEEDINGS{10610017,