Joint MECHATRONICS 2025, ROBOTICS 2025 Paper Abstract

Joint MECHATRONICS 2025, ROBOTICS 2025 Paper Abstract

Paper FrAT5.2

Cramer, Emma (RWTH Aachen University), Jäschke, Lukas (Institute for Data Science in Mechanical Engineering (RWTH Aache), Trimpe, Sebastian (RWTH Aachen University)

CHEQ-Ing the Box: Safe Variable Impedance Learning for Robotic Polishing

Scheduled for presentation during the Regular Session "Human-Robot Interaction" (FrAT5), Friday, July 18, 2025, 10:20−10:40, Room 109

Joint 10th IFAC Symposium on Mechatronic Systems and 14th Symposium on Robotics, July 15-18, 2025, Paris, France

This information is tentative and subject to change. Compiled on August 2, 2025

Keywords Learning robot control, Force and compliance control, Adaptive robot control

Abstract

Robotic systems are increasingly employed for industrial automation, with contact-rich tasks like polishing requiring dexterity and compliant behaviour. These tasks are difficult to model, making classical control challenging. Deep reinforcement learning (RL) offers a promising solution by enabling the learning of models and control policies directly from data. However, its application to real-world problems is limited by data inefficiency and unsafe exploration. Adaptive hybrid RL methods blend classical control and RL adaptively, combining the strengths of both: structure from control and learning from RL. This has led to improvements in data efficiency and exploration safety. However, their potential for hardware applications remains underexplored, with no evaluations on physical systems to date. Such evaluations are critical to fully assess the practicality and effectiveness of these methods in real-world settings. This work presents an experimental demonstration of the hybrid RL algorithm CHEQ for robotic polishing with variable impedance, a task requiring precise force and velocity tracking. In simulation, we show that variable impedance enhances polishing performance. We compare standalone RL with adaptive hybrid RL, demonstrating that CHEQ achieves effective learning while adhering to safety constraints. On hardware, CHEQ achieves effective polishing behaviour, requiring only eight hours of training and incurring just five failures. These results highlight the potential of adaptive hybrid RL for real-world, contact-rich tasks trained directly on hardware.