Joint MECHATRONICS 2025, ROBOTICS 2025 Paper Abstract

Joint MECHATRONICS 2025, ROBOTICS 2025 Paper Abstract

Paper FrAT5.1

JALAYER, REZA (Politecnico di Milano University), Chen, Yuxin (University of California at Berkeley), Jalayer, Masoud (University of Turku), Orsenigo, Carlotta (Politecnico di Milano University), Tomizuka, Masayoshi (Univ of California, Berkeley)

Testing Human-Hand Segmentation on In-Distribution and Out Of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model

Scheduled for presentation during the Regular Session "Human-Robot Interaction" (FrAT5), Friday, July 18, 2025, 10:00−10:20, Room 109

Joint 10th IFAC Symposium on Mechatronic Systems and 14th Symposium on Robotics, July 15-18, 2025, Paris, France

This information is tentative and subject to change. Compiled on August 2, 2025

Keywords Modeling and identification, Sensory based robot control, Robust robot control

Abstract

Reliable detection and segmentation of human hands are critical for enhancing safety and facilitating advanced interactions in human-robot collaboration. Current research predominantly evaluates hand segmentation under in-distribution (ID) data, which reflects the training data of deep learning (DL) models. However, this approach fails to address out-of-distribution (OOD) scenarios that often arise in real-world human-robot interactions. In this work, we make three key contributions: first we assess the generalization of deep learning (DL) models for hand segmentation under both ID and OOD scenarios, utilizing a newly collected industrial dataset that captures a wide range of real-world conditions including simple and cluttered backgrounds with industrial tools, varying numbers of hands (0 to 4), gloves, rare gestures, and motion blur. Our second contribution is considering both egocentric and static viewpoints. We evaluated the models trained on four datasets, i.e. EgoHands, Ego2Hands (egocentric mobile camera), HADR, and HAGS (static fixed viewpoint) by testing them with both egocentric (head-mounted) and static cameras, enabling robustness evaluation from multiple points of view. Our third contribution is introducing an uncertainty analysis pipeline based on the predictive entropy of predicted hand pixels. This procedure enables flagging unreliable segmentation outputs by applying thresholds established in the validation phase. This enables automatic identification and filtering of untrustworthy predictions, significantly improving segmentation reliability in OOD scenarios. For segmentation, we used a deep ensemble model composed of UNet and RefineNet as base learners. Our experiments demonstrate that models trained on industrial datasets (HADR, HAGS) outperform those trained on non-industrial datasets, both in segmentation accuracy and in their ability to flag unreliable outputs via uncertainty estimation. These findings underscore the necessity of domain-specific training data and show that our uncertainty analysis pipeline can provide a practical safety layer for real-world deployment.