IROS 2023: Machine Learning Enables 3D with Only One Camera
NEWS, Research, Robotics, Perception, Artificial Intelligence |

Simon, you came to TUM two years ago and are now researching in the field of "Machine Learning for Robots." What specific area are you working on?
My focus is on accurately understanding what is happening in the three-dimensional world with just a single camera. Currently, many cameras are distributed around the sidelines of a soccer field, for example, to confirm or revise a linesman's decision on the field. I'm trying to achieve the same with only one camera using machine learning. Our principle, Globally Consistent Probabilistic Human Motion Estimation (abbreviated as GloPro), works as follows: What humans learn from experience, such as the size of objects like a laptop, a chair, or a towel, is taught to the system through machine learning using countless images. The camera gradually learns the "scale," which is the size and dimensions of the objects in the image, and calculates a three-dimensional image of the environment from these learned magnitudes. At the same time, we determine the accuracy of our estimates.
What is the benefit of this research for robotics?
Our goal is to understand how humans and robots can work together effectively and safely in settings like a factory. For this, it's important that a robot can understand the movements of a human in the simplest way possible and ideally even predict them. Initially, we aim to estimate distances and movements accurately. In our research, we focus on a person's poses and posture. When using a single camera, we can utilize a large number of datasets to help us calculate the three-dimensional shape of a person quite reliably. What's new is that this calculation also works reliably when both the camera and the person are moving simultaneously. In GloPro, we have also estimated the uncertainty inherent in the calculation of the three-dimensional body network. The darker the points on the observed body, the more certain the calculation of the respective point (*as seen in the video below). The less a body is obscured, the better the measurements match reality. Consequently, the inaccuracy ranges from one to up to 50 centimeters.
What is the specific focus of your doctoral thesis?
Initially, I want to understand people using a camera and estimate the 3D pose of the human as accurately as possible, as described in the paper. In the next step, I consider the context and extract not only the human, as shown in the video. This is a prerequisite for the most crucial step, understanding the movements of the person in context. I aim to use the resulting model in a drone and allow it to interact with humans.
Papers presented by Prof. Stefan Leutenegger's team at IROS 2023 in Detroit:
1. GloPro: Globally-Consistent Uncertainty-Aware 3D Human Pose Estimation & Tracking in the Wild
2. BodySLAM++: Fast and Tightly-Coupled Visual-Inertial Camera and Human Motion Tracking

Receive our featured stories, news, events, and videos in your inbox every month, subscribe here!